File too large checked in: Difference between revisions
No edit summary |
No edit summary |
||
Line 26: | Line 26: | ||
WHY DO I FIND OUT ABOUT IT SO LATE? | WHY DO I FIND OUT ABOUT IT SO LATE? | ||
When you clone a repo, hooks are not cloned, so there is no easy way to give them to all users. | |||
and we do | There are some incomplete and limited ways to add a hook that would detect it during git commit. | ||
to | We are looking into ways to improve this so you could get earlier warning about a file being to large. | ||
WHY IS IT SO HARD TO FIX | |||
Since git is a powerful source code control system, you might hope that it would easily handle this situation. | |||
However, because git builds immutable trees, which are a good thing for so many purposes, | |||
removing something or changing it requires changing the git history of the branch. | |||
We must avoid pushing large files to the shared repo main branch. | |||
Once it goes there, hundreds of users all over the world will pick it up automatically, | |||
and there is no way to go around fixing up all of those copies to remove large files from their history. | |||
However, git can indeed fix the history of a branch in your local git tree which has not been pushed. | |||
And that is what we are going to do here. | |||
FIXING YOUR LOCAL BRANCH WITH LARGE FILE CHECKED IN | |||
In order to fix your branch, you are going to have to use some form of git rebase on it, | |||
otherwise, it could never be fixed. | |||
A common case is where a user realizes the mistaken large file, | |||
and uses git rm to remove it, or uses git add to replace it with a smaller version of the file, | |||
such as a test file or jpg image or pdf, and git commit. | |||
So the large file no longer exists on the tip of their branch. | |||
However, it does exist in the history. | |||
As usual with all of this stuff, | |||
if you have unchecked in stuff, | |||
check it in or use stash to clean up your repo for action. | |||
git add # this is often a good choice. | |||
git commit | |||
or | |||
git stash # only if needed | |||
SQUASH? | |||
If you were going to squash your development branch anyways, | |||
then you can just merge --squash into the master branch, | |||
and the system is smart enough to skip the large file that no longer exists | |||
when it does so. | |||
git checkout master | |||
As usual, may have to handle git conflicts during any merge. | |||
git merge --squash myDevBranch | |||
If your changes were on master, and not on a dev branch, | |||
turn your master branch into a dev branch, | |||
and then create a new master branch, and squash that onto it. | |||
Only do this if it makes sense. | |||
git fetch # update origin/master | |||
git branch -m master tempMaster | |||
git branch master origin/master | |||
Look at .git/config to fix master branch tracking if needed. | |||
git checkout master | |||
git merge --squash tempMaster | |||
git push | |||
# after | |||
The benefit is that it is simple and you are done. | |||
The disadvantage is that you lose your commit history, | |||
and all those changes just became one commit on master branch. | |||
This is just right for many users. | |||
GIT CHERRY-PICK? | |||
NOT RECOMMENDED | |||
If you only have a handful of commits, and you know which ones they are, | |||
you can try to use this method. It is a tedious. | |||
You would have to use git log to find which specific commits need to be saved. | |||
You might have to turn master branch into a dev or temp branch | |||
as above, create new master, and then pick specific commits | |||
from the temp branch onto master. | |||
You may still need to do a git rebase -i if you cannot not | |||
make the large file go away simply by skipping a no longer needed commit or two. | |||
GIT REBASE |
Revision as of 00:28, 25 April 2021
FILE TOO LARGE CHECKED IN and HOW TO FIX IT
When I do git push I see this error:
Exceeds file size limit 2200000.
WHY BIG FILES ARE NOT ALLOWED
The kent repo has a limit (currently 2.2 MB) on file sizes being checked in. The restriction has been implemented as a hook in the central shared repo that developers push to. We already did not want large files to be checked-in, and during the transition from CVS to git, many huge test files were removed. Also, github has size restrictions which have to be honored. And people will find kent repo excessively bloated and hard to use without this size restriction. This is a repository of source code text, which is small.
WHY PEOPLE CHECK IN BIG FILES
Because developers are encouraged to make standard tests subdirectory for their kent utilities, there are testing files which get checked in, and unless care is exercised, it is very easy for programmers who deal with giant genomics files to accidentally check them in. Also, sometimes people want to check in PDF documents and some reasonably sized JPG or PNG images. Please use JPG when it is a camera image for better compression and smaller size. PNG is lossless compression, which is bigger, and good for diagrams non-photographic things with a small number of colors. And sometimes, people just make a mistake, or forget about the limit.
WHY DO I FIND OUT ABOUT IT SO LATE?
When you clone a repo, hooks are not cloned, so there is no easy way to give them to all users. There are some incomplete and limited ways to add a hook that would detect it during git commit. We are looking into ways to improve this so you could get earlier warning about a file being to large.
WHY IS IT SO HARD TO FIX
Since git is a powerful source code control system, you might hope that it would easily handle this situation. However, because git builds immutable trees, which are a good thing for so many purposes, removing something or changing it requires changing the git history of the branch. We must avoid pushing large files to the shared repo main branch. Once it goes there, hundreds of users all over the world will pick it up automatically, and there is no way to go around fixing up all of those copies to remove large files from their history.
However, git can indeed fix the history of a branch in your local git tree which has not been pushed. And that is what we are going to do here.
FIXING YOUR LOCAL BRANCH WITH LARGE FILE CHECKED IN
In order to fix your branch, you are going to have to use some form of git rebase on it, otherwise, it could never be fixed.
A common case is where a user realizes the mistaken large file, and uses git rm to remove it, or uses git add to replace it with a smaller version of the file, such as a test file or jpg image or pdf, and git commit. So the large file no longer exists on the tip of their branch. However, it does exist in the history.
As usual with all of this stuff, if you have unchecked in stuff, check it in or use stash to clean up your repo for action.
git add # this is often a good choice. git commit
or
git stash # only if needed
SQUASH? If you were going to squash your development branch anyways, then you can just merge --squash into the master branch, and the system is smart enough to skip the large file that no longer exists when it does so.
git checkout master
As usual, may have to handle git conflicts during any merge.
git merge --squash myDevBranch
If your changes were on master, and not on a dev branch, turn your master branch into a dev branch, and then create a new master branch, and squash that onto it. Only do this if it makes sense.
git fetch # update origin/master git branch -m master tempMaster git branch master origin/master
Look at .git/config to fix master branch tracking if needed.
git checkout master git merge --squash tempMaster git push
# after
The benefit is that it is simple and you are done.
The disadvantage is that you lose your commit history, and all those changes just became one commit on master branch. This is just right for many users.
GIT CHERRY-PICK?
NOT RECOMMENDED
If you only have a handful of commits, and you know which ones they are, you can try to use this method. It is a tedious. You would have to use git log to find which specific commits need to be saved. You might have to turn master branch into a dev or temp branch as above, create new master, and then pick specific commits from the temp branch onto master. You may still need to do a git rebase -i if you cannot not make the large file go away simply by skipping a no longer needed commit or two.
GIT REBASE