Working with Git

From genomewiki
Jump to navigationJump to search

Git structure

Definitions

central repository (or shared repository or origin/master) – The main repository we all share. This is where we push files for everyone to use. Contains a history of everyone’s commits that have been pushed.

local repository (or your repository or master) – Your main repository. This is where you push files from. Contains a history of all your commits for each branch.

branch – A version of files that are stored in your repository. You can have many branches in your local repository, each with their own history of commits. Our main branch is called the master. For more information see Working_with_branches_in_Git

staging area (or staged changes or cached or commit list or index
) – The place where git keeps track of things you have added but not yet committed.

working directory (or sandbox) – Your actual files in your ~/kent/ directory.

Terms that can refer to multiple things

history – both your local repository and the central repository have a history. This history contains all the commits you have done as well as any merges, including a ‘git pull’ (this is why you see

  Merge branch 'master' of …

when you do a git log (see below)).

HEAD – names the current branch you are on, typically master. It is often convenient to use HEAD instead of having to name the specific branch you are on.

master – the main branch of either the central repository or the local repository. Can be used to point to the last commit ID of your master branch.

Commit IDs and Referring to ancestors

One of the useful things about git is that git keeps track of the current states of commits, files and trees by creating a unique hash ID for each. In fact, you can use just the first 7 or so characters as an abbreviated hash ID in any of the git commands (eg. you can use 27f3e63 as an abbreviation for 27f3e639263d5bb0c018d5ec7cf6633c2ccd7e07). As long as it's unique, git will expand it for you. Of course, an abbreviation that was unique at one time is not guaranteed to be unique in the future.

As noted above, HEAD refers to the branch you are on. The term "master" can be used to refer to the last commit ID on the master branch. This means that for any of the git commands, you can use HEAD and the last commit ID of the master branch interchangeably. Likewise, anyplace you see HEAD or "master" in the git commands, you can substitute a commit ID.

Another benefit of Git is that it keeps track of ancestors of every commit. This means that if you want to specify the parent of a certain commit (i.e. the previous commit) you can use:

 HEAD^1 or 27f3e63^1

where "^1" specifies the parent or "the previous commit". Many people shorten "^1" to be just "^". If you are interested in the grandparent of HEAD you would do:

 HEAD^1^1

which is the same as

 HEAD^^

which is also the same as

 HEAD~2

Of course, this can be continued for as far back as you would like to go. However, a problem occurs when you get to a merge, because there are two parents. To refer to the second parent you use the following notation:

 HEAD^2

This information will become more useful later on in this page.

The commands

Making changes

'git add fileName'

Use this command to tell git that you are interested in "saving" this file in your local repository. You can do multiple git adds to a file as you change it. This is a great way to save a file that you are making lots of edits to before making a final commit. Adding a file does not change your history.

'git rm fileName'

Removes a file from your working directory and staging area. After you do a 'git rm fileName' you still need to do a 'git commit fileName' in order to remove it from your repository. Can do with -r to recursively remove files and directory.

'git commit -m "some message" fileName(s)'

Commits the specified file(s) to your local history. If you do not specify a file, it will commit everything in your staging area (i.e. anything to which you have done a git add or git rm, etc). If a file is newly created (i.e. not already in your local repository), you must do a git add before you do a git commit.

Fixing un-wanted changes

'git checkout HEAD fileName'

Resets just your working directory file for that one file. Can also be used for your whole working directory if no fileName is specified.

'git commit --amend -m "some message" fileName'

Abandons the most recent commit in your history and makes this commit the newest commit in the history. Can also be used to re-do the commit message.

It is important to not modify commits that have already been pushed to the central repository.

Once a commit is in the shared repository, you have to do some very complicated maneuvers to modify the commit, which are not covered here.

'git reset --[option] HEAD'

Your main friend for backing out changes. There are three useful options for this command: --soft, --mixed (default) and --hard. Here is what each of the options does (if there is an X in the column that means that it resets that particular thing back to the HEAD of your central repository):

	   HEAD of            staging           working                       
	local repository       area            directory 
--hard       X                  X                  X 
--mixed      X                  X 
--soft       X  

Note that the '--soft' option in this case doesn't actually do anything since you are resetting the HEAD of your local repository back to itself. The '--soft' option is only useful if you are trying to back out the very last commit you made that has not been pushed to the central repository. To back your very last commit out use:

  git reset --soft HEAD^1

This is another command that you should be careful not to use on commits that have already been pushed to the central repository.

This will leave your staging area and working directory unchanged. You can also do a 'git reset --[option] HEAD^' with --mixed and --hard if you would like, just keep in mind that it will delete that commit from either your local repository and the staging area or from your local repository, the staging area and your working directory (see table above). You can also go back more than one commit using the ancestor annotation mentioned at the beginning of this page.

Fixing un-wanted large file that was accidentally committed

An engineer recently accidentally checked in a too-large file. We have limits for large files which the hooks on the shared repos process when doing git push that check for files which are over the limit. Try to be careful not to check in large files.

If it happens, the git push will fail because it was too large. In this case the user has changes that were committed, and nothing that would be lost from the working directory. They had done a git pull earlier which created a merge commit. Then they tried to do git rm to remove the file. But that only cleans it from the working directory, but not from the history. Luckily, we know that it is not shared history, because the push failed.

We located the commit with the bad large file using this to see recent commits

git log -3 --stat

We noted the hashId of the commit where the big file was committed. There were no other commits after it that needed preserving. This reset it back:

git reset --hard <hashId noted earlier>

This reset the actual commit so it never happened, but leaves the files staged:

git reset --soft HEAD^

This removes the bad large file from the staging:

git reset HEAD badLargeFile

This commits it:

git commit -m 'Use your original commit message here'

Then git pull, git push succeeds.

Getting information about commits and your staging area

'git show [--stat] commitID[:path/fileName]'

One of the more useful commands. Since git keeps track of files, trees and commits by taking the entire contents of all the files by creating a hash ID of them, you can use git show to see exactly what was committed. You can do this for an entire commit, or you can specify just one file of a commit. The '--stat' option will suppress the actual changes and instead list the files that were committed.

'git diff'

A multipurpose command that lets you see the diffs between many different areas. Here are some useful options:

'git diff' - see changes in your working directory that you haven't added/removed.

'git diff --cached' - see changes that are added/removed but not committed.

'git diff HEAD' - see changes between your working directory and HEAD.

'git diff --stat' - see just list of files that have changed.

'git diff fileName' - see changes that have happened to a specific file or directory.

'git diff commitID commitID' - see differences between commits. If a commitID on one side is omitted, it will have the same effect as using HEAD instead.

'git log [path/fileName]'

Shows the history (commits/merges) of your entire local repository or just for one file. The most recent commits will be at the top with later commits following. You can limit the output with: -# (lists # number of commits), --after=<date>, --before=<date>, --author=<pattern>, and --no-merges (don't list merges or pulls). The --stat option will list what files were changed in each commit.

By definition, when you do a merge you are alter the history of every file, even though there might not be any actual changes to that file. When you do a simple "git log [path/fileName]", git turns on history simplification by default, which removes from the displayed logs merges which did not change the file. By adding the --full-history option, you can show all commits to a file, including all merges.

'git blame fileName'

Details who edited each line of a file. Useful option: -L <start>, <end> or -L <start>+offset - prints the blame for those only lines.

'git status'

Gives a report on the status of tracked and untracked files. Will also show how far ahead of the central repository you are since your last 'git fetch' or 'git pull'.

Keeping up with the central repository

'git pull'

Pulls in commits from the central repository made by others. 'git pull' (and 'git merge') will fail if you have staged files or changes in your working directory, staging area or commit history that might be in conflict with changes that are going to be pulled in. Go to Resolving_merge_conflicts_in_Git for more information.

'git stash'

Stashes away any changes in your staging area and working directory. Very useful if you are working on something and want to pull in the most recent changes. You can use it to resolve situation above like so:

git pull
...
file foobar not up to date, cannot merge.
git stash
git pull
git stash pop

Note that you have to do a 'git stash pop' to get your half-baked changes back into your working directory and index. You may need to resolve conflicts if you pulled in someone else's changes to the same file you were working on. You should not use 'git stash' to store changes long-term; instead use a separate branch.

Here are some other helpful options to use with git stash:

  • git stash list: lists the stashes that you currently saved
  • git stash pop '<stashNumber>': removes the most recent stash or any stash specified and applies changes as a merge. If merge fails the stash is not removed from the list and must be removed manually.
  • git stash apply '<stashNumber>': applies changes of recent stash or any stash is list specified, but doesn't remove it from the list.
  • git stash drop '<stashNumber>': Removes the most recent stash or any stash specified from the list.
  • git stash clear: removes all stash's stored in the list
'git fetch'

Brings the most recent changes from the central repository into your local repository, without merging those changes with your local repository. It is useful if you want to see the changes others have been making and how they will affect your local repository before doing a 'git merge'.

'git merge'

Used to merge your local repository with the central repository after doing a 'git fetch'. Note that you may run into the same merge problems as 'git pull' (see above). 'git merge' is also used to merge local branches in your repository.

'git push'

Pushes your changes out to the central repository. Will abort if you haven't pulled in everyone's most recent changes.

Git Options

When turning on the git.color options, you may need the -R option in your LESS environment variable. For example:

export LESS="--no-init --quit-if-one-screen --QUIT-AT-EOF -R"

This will properly colorize the lines in the git pager. To turn on git color options for everything:

git config color.ui true

Or, individually:

git config color.branch auto
git config color.diff auto
git config color.interactive auto
git config color.status auto

Useful resources

see also