Sometimes it happens that a teammate commits unwanted files to the git repository and later we delete them from the repo. But still these files are in git history, so every clone of repository will fetch these files history which consumes time, bandwidth and disk space.
Let’s check way to clean up the git repository for deleted files.
“Make sure you take a backup copy of local repository for anything that goes wrong in your case.”
Use git filter-branch command to remove a file from all the commits:
git filter-branch --prune-empty -d /dev/shm/scratch \ --index-filter "git rm --cached -f --ignore-unmatch filename" \ --tag-name-filter cat -- --all
git filter-branch options used:
--prune-emptyremoves commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
-dnames a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in
/dev/shmwill result in faster execution.
--index-filteris the main event and runs against the index at each step in the history. You want to remove
oops.isowherever it is found, but it isn’t present in all commits. The command
git rm --cached -f --ignore-unmatch oops.isodeletes the DVD-rip when it is present and does not fail otherwise.
--tag-name-filterdescribes how to rewrite tag names. A filter of
catis the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
--specifies the end of options to
--is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.
You can also remove a whole directory:
git filter-branch --prune-empty -d /dev/shm/scratch \ --index-filter "git rm --cached -rf --ignore-unmatch dirname" \ --tag-name-filter cat -- --all
You can check that, commits including the file have been modified and commit with only that file are removed from the log. Check using
Shrink the repository
We used git-filter-branch to get rid of files from commits. People expect the resulting repository to be smaller than the original, but you need a few more steps to actually make it smaller because Git tries hard not to lose your objects until you tell it to.
- Remove the original refs backed up by git-filter-branch (do this for all branches):
git update-ref -d refs/original/refs/heads/master
- Expire all reflogs with:
git reflog expire --expire=now --all
- Garbage collect all unreferenced objects with
git gc --prune=now
You are ready to push now.
Push your updated tree on the git repository. Make sure you have enough rights to do so.
git push -f