Working with Git

From genomewiki
Jump to navigationJump to search

Git structure

Definitions

central repository (or shared repository or origin/master) – The main repository we all share. This is where we push files for everyone to use. Contains a history of everyone’s commits that have been pushed.

local repository (or your repository or master) – Your main repository. This is where you push files from. Contains a history of all your commits for each branch.

branch – A version of files that are stored in your repository. You can have many branches in your local repository, each with their own history of commits. Our main branch is called the master.

staging area (or staged changes or cached or commit list or index
) – The place where git keeps track of things you have added but not yet committed.

working directory (or sandbox) – Your actual files in your ~/kent/ directory.

Terms that can refer to multiple things

history – both your local repository and the central repository have a history. This history contains all the commits you have done as well as any merges, including a ‘git pull’ (this is why you see

  Merge branch 'master' of …

when you do a git log (see below)).

HEAD – names the current branch you are on, typically master. It is often convenient to use HEAD instead of having to name the specific branch you are on.

master – the main branch of either the central repository or the local repository. Can be used to point to the last commit ID of your master branch.

Commit IDs and Referring to ancestors

One of the useful things about git is that git keeps track of the current states of commits, files and trees by creating a unique hash ID for each. In fact, you can use just the first 7 or so characters as an abbreviated hash ID in any of the git commands (eg. you can use 27f3e63 as an abbreviation for 27f3e639263d5bb0c018d5ec7cf6633c2ccd7e07). As long as it's unique, git will expand it for you. Of course, an abbreviation that was unique at one time is not guaranteed to be unique in the future.

As noted above, HEAD refers to the branch you are on. The term "master" can be used to refer to the last commit ID on the master branch. This means that for any of the git commands, you can use HEAD and the last commit ID of the master branch interchangeably. Likewise, anyplace you see HEAD or "master" in the git commands, you can substitute a commit ID.

Another benefit of Git is that it keeps track of ancestors of every commit. This means that if you want to specify the parent of a certain commit (i.e. the previous commit) you can use:

 HEAD^1 or 27f3e63^1

where "^1" specifies the parent or "the previous commit". Many people shorten "^1" to be just "^". If you are interested in the grandparent of HEAD you would do:

 HEAD^1^1

which is the same as

 HEAD^^

which is also the same as

 HEAD~2

Of course, this can be continued for as far back as you would like to go. However, a problem occurs when you get to a merge, because there are two parents. To refer to the second parent you use the following notation:

 HEAD^2

This information will become more useful later on in this page.

The commands

Making changes

'git add fileName'

Use this command to tell git that you are interested in "saving" this file in your local repository. You can do multiple git adds to a file as you change it. This is a great way to save a file that you are making lots of edits to before making a final commit. Adding a file does not change your history.

'git rm fileName'

Removes a file from your working directory and staging area. After you do a 'git rm fileName' you still need to do a 'git commit fileName' in order to remove it from your repository. Can do with -r to recursively remove files and directory.

'git commit -m "some message" fileName(s)'

Commits the specified file(s) to your local history. If you do not specify a file, it will commit everything in your staging area (i.e. anything to which you have done a git add or git rm, etc). If a file is newly created (i.e. not already in your local repository), you must do a git add before you do a git commit.

Fixing un-wanted changes

'git checkout HEAD fileName'

Resets just your working directory file for that one file. Can also be used for your whole working directory if no fileName is specified.

'git commit --amend -m "some message" fileName'

Abandons the most recent commit in your history and makes this commit the newest commit in the history. Can also be used to re-do the commit message.

It is important to not modify commits that have already been pushed to the central repository.

Once a commit is in the shared repository, you have to do some very complicated maneuvers to modify the commit, which are not covered here.

'git reset --[option] HEAD'

Your main friend for backing out changes. There are three useful options for this command: --soft, --mixed (default) and --hard. Here is what each of the options does (if there is an X in the column that means that it resets that particular thing back to the HEAD of your central repository):

	   HEAD of            staging           working                       
	local repository       area            directory 
--hard       X                  X                  X 
--mixed      X                  X 
--soft       X  

Note that the '--soft' option in this case doesn't actually do anything since you are resetting the HEAD of your local repository back to itself. The '--soft' option is only useful if you are trying to back out the very last commit you made that has not been pushed to the central repository. To back your very last commit out use:

  git reset --soft HEAD^1

This is another command that you should be careful not to use on commits that have already been pushed to the central repository.

This will leave your staging area and working directory unchanged. You can also do a 'git reset --[option] HEAD^' with --mixed and --hard if you would like, just keep in mind that it will delete that commit from either your local repository and the staging area or from your local repository, the staging area and your working directory (see table above). You can also go back more than one commit using the ancestor annotation mentioned at the beginning of this page.

Getting information about commits and your staging area

'git show [--stat] commitID[:path/fileName]'

One of the more useful commands. Since git keeps track of files, trees and commits by taking the entire contents of all the files by creating a hash ID of them, you can use git show to see exactly what was committed. You can do this for an entire commit, or you can specify just one file of a commit. The '--stat' option will suppress the actual changes and instead list the files that were committed.

'git diff'

A multipurpose command that lets you see the diffs between many different areas. Here are some useful options:

'git diff' - see changes in your working directory that you haven't added/removed.

'git diff --cached' - see changes that are added/removed but not committed.

'git diff HEAD' - see changes between your working directory and HEAD.

'git diff --stat' - see just list of files that have changed.

'git diff fileName' - see changes that have happened to a specific file or directory.

'git diff commitID commitID' - see differences between commits. If a commitID on one side is omitted, it will have the same effect as using HEAD instead.

'git log [path/fileName]'

Shows the history (commits/merges) of your entire local repository or just for one file. The most recent commits will be at the top with later commits following. You can limit the output with: -# (lists # number of commits), --after=<date>, --before=<date>, --author=<pattern>, and --no-merges (don't list merges or pulls). The --stat option will list what files were changed in each commit.

'git blame fileName'

Details who edited each line of a file. Useful option: -L <start>, <end> or -L <start>+offset - prints the blame for those only lines.

'git status'

Gives a report on the status of tracked and untracked files. Will also show how far ahead of the central repository you are since your last 'git fetch' or 'git pull'.

Keeping up with the central repository

'git pull'

Pulls in commits from the central repository made by others. 'git pull' (and 'git merge') will fail if:

a) you have ANY staged files. --> To get around this either commit your changes or un-stage your changes (see 'git reset').

b) you have local uncommitted changes in your working directory that overlap with files that git pull/git merge may need to update. --> To get around this use 'git stash' (see below).

'git stash'

Stashes away any changes in your staging area and working directory. Very useful if you are working on something and want to pull in the most recent changes. You can use it to resolve situation "b" above like so:

git pull
...
file foobar not up to date, cannot merge.
git stash
git pull
git stash pop

Note that you have to do a 'git stash pop' to get your half-baked changes back into your working directory and index. You may need to resolve conflicts if you pulled in someone else's changes to the same file you were working on. You should not use 'git stash' to store changes long-term; instead use a separate branch.

Resolving conflicts: It is very similar to CVS. In order to resolve a conflict from doing a git pull or git merge, you will have to edit the file in your working directory, do a git add and git commit it. You can use 'git diff' to see the changes that need to be resolved. Git will not let you commit until you have resolved your conflicts.

'git fetch'

Brings the most recent changes from the central repository into your local repository, without merging those changes with your local repository. It is useful if you want to see the changes others have been making and how they will affect your local repository before doing a 'git merge'.

'git merge'

Used to merge your local repository with the central repository after doing a 'git fetch'. Note that you may run into the same problems as 'git pull' if you have staged changes or git thinks there will be a conflict with your working directory (see above). 'git merge' is also used to merge local branches in your repository.

'git push'

Pushes your changes out to the central repository. Will abort if you haven't pulled in everyone's most recent changes.

Git Options

When turning on the git.color options, you may need the -R option in your LESS environment variable. For example:

export LESS="--no-init --quit-if-one-screen --QUIT-AT-EOF -R"

This will properly colorize the lines in the git pager. To turn on git color options for everything:

git config color.ui true

Or, individually:

git config color.branch auto
git config color.diff auto
git config color.interactive auto
git config color.status auto

Useful resources

Git Community Book

Git User Manual

Git Man Pages

Git Cheat Sheet