Git pull: what happens when and why?

From genomewiki
Jump to navigationJump to search

What happens when you git pull?

Git pull does a fetch and then a merge. git fetch updates your repo bringing in new git repo objects, updates origin/master. git fetch does not change your working directory or local branches at all. git merge merges the shared-repo origin/master into your local master branch, possibly with conflicts. git can fast-forward your master branch if nobody has changed the same files or if you use stash.

Cases:

  1. All of your changes are in files other people have not changed.
    • Result: you can usually just pull with no concern or interference.
    • No stash, no commit, no asking for merge-commit messages. simple fast-forward.
  2. Some of your changes are in files other changed, but not at the same places. [NO merge-conflict]
    • (A separation of a few lines between your changes and others changes is usually enough.)
    • Result: you must stash or commit.
    • If you stash, then it fast-forwards -- NOT asking for a merge commit message.
    • If you commit, it will ask for a merge commit message even though there are no conflicts.
    • (This is the case which changed a few years ago when Linus Torvalds decided that users should at be nagged for an explanation of why they are merging other branches into their current development branch. We added the environment variable to try to bring back the original behavior of not nagging: GIT_MERGE_AUTOEDIT=no)
  3. Some of your changes are in files at the same places other changed. [merge-conflict]
    • Result: You must stash or commit. You will have merge-conflicts.
    • If you stash, then fast-forwards without asking for a merge commit message.
    • When you git stash pop, it will have conflicts in files that must be resolved with your editor. No commit is required.
    • If you commit, it will have conflicts in files that must be resolved with your editor.
    • You must follow it up with git add for each resolved file, and then finish with a final merge commit with a message.

So, git stash works great when you have a small number of changes and will hopefully be committing them before long anyways. You should try to do your git stash pop immediately after doing the git pull so that the changes are fresh in your mind, you can fix any conflicts, and so you will not forget they are on the stash and start working on other stuff.

If something is ready, go ahead and commit and push it.

If something is going to take a while, put it on a development branch. git makes this fairly easy.

The place where git stash can go wrong is when you have more than one stash building up. Either you forgot you had stuff on your stash stack, now you have even more uncommitted stuff. You might get away with stashing multiple times, but it seems like bad practice and you run the risk of losing or messing up your changes. If you start to have multiple stashes AND multiple development branches, it can be complicated and confusing.

So a little git stash followed immediately by pull and pop is just fine. But don't try to get elaborate with your git stashing. Needing multiple stashes is a clear sign that you should be using local dev branches.

git pull --rebase

Overall summary: There is nothing wrong with git merges. But git allows rebasing.

A way that avoids creating merge commits is to run

git pull --rebase

instead of just "git pull". This can still have conflicts which must be resolved. But it creates no merge commits.

git pull --rebase does the following:

  1. temporarily "rewinds" your branch, i.e. undoes your recent commits so that your branch is in the same state as the last time you did a git pull
  2. fast-forwards your branch (master) to the current upstream HEAD (origin/master) from the shared repository.
  3. re-applies (cherry-pick) your unpushed commits to the end of your local branch (master).

When git re-applies your commits on the updated tree, there may be merge conflicts that you then resolve as usual -- but the conflicts arise one commit at a time (one of your commits) instead of for all of your commits at once. This might be easier for some cases and harder for others.

One small advantage is that it avoids making merge commits in the git history. And you do not see other people's work in your git status during merge conflicts.

One minor disadvantage is that it creates churn in your local repository by having to cherry-pick all your commits from where they were to their new location, which means creating newly-worked commits and objects in the local repository. Eventually garbage collection will remove the old dead objects.

Another minor disadvantage is that because it has to cherry-pick the commits over and over and over, it increases the activity in the system, and the probability that something might got wrong during it.

Another minor disadvantage is that it distorts the history a little, implying that the changes were actually made later than they really did. It fakes a linear history that did not actually happen.

If somebody else gets their work pushed before you do you will have to git pull --rebase again (extra churn).

git stash is still useful for setting aside changes-in-progress for a short time.

Git is powerful and can do re-basing, which re-writes the local history. One should be very careful though not to change any history which has already been pushed to the shared repository. git pull --rebase is relatively safe and will not try to rebase too far back in the shared history. Other rebasing commands are potentially dangerous and should be used with caution.

git merge --squash

For a somewhat large local development branch which one has worked on for weeks or months, with dozens of commits, it may be worthwhile to squash all that work into one big commit at the time that you are merging the branch back into the master branch.

The fairly nice advantage for code reviewers is that git-reports will just show one single big commit with all the changes in it.

A disadvantage is that it distorts the history of what really happened. The detailed thought and justification that when into the various individual commit messages of the local dev branch is lost.

If I made a development branch like this:

git checkout master   
git pull  # up-to-date
git branch newProject  # create a branch "newProject" that starts off where master currently is.
git checkout newProject # update your working dirs to match (does not do much at this moment).

Do work

vi somefiles
git add somefiles
git commit -m 'some nice thoughts about the changes'

Do more work

vi somefiles
git add somefiles
git commit -m 'some more nice thoughts about more changes'

Repeat this many times.

If this goes on for a really long time, you may worry that you are so out of date that you really need to pull in others work to make sure nothing will break. Also if you are nearly done with the newProject, you should do this right near the end so that you can make sure nothing breaks.

git fetch   # updates origin/master  

We can either switch back to local master and pull to update it, or just use origin/master

git merge origin/master  # may have to deal with merge-conflicts and re-testing.

When you are all done, do this to merge it back into local master:

git checkout master
git merge newProject   # now all the changes from newProject have been merged into the local master. minor conflicts possible.

To squash-merge the entire project, do this instead:

git checkout master
git merge --squash newProject  # makes one big commit, minor conflicts possible.

You can eventually just delete the newProject branch since you are done with it. It still contains a record of the detailed individual commits that you actually made. It is a good idea to at least rename it so that you can easily tell at glance that it has already been merged into master, that you are basically done with it.

git branch -m newProject newProjectSquashed
git push