Remove deleted files from git repository history

Remove deleted files from git repository | Silent Infotech

Sometimes it happens that a teammate commits unwanted files to the git repository and later we delete them from the repo. But still these files are in git history, so every clone of repository will fetch these files history which consumes time, bandwidth and disk space.

Let’s check way to clean up the git repository for deleted files.

“Make sure you take a backup copy of local repository for anything that goes wrong in your case.”

git filter-branch

Use git filter-branch command to remove a file from all the commits:

git filter-branch --prune-empty -d /dev/shm/scratch \
 --index-filter "git rm --cached -f --ignore-unmatch filename" \
 --tag-name-filter cat -- --all

git filter-branch options used:

  • --prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
  • -d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution.
  • --index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.
  • --tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
  • -- specifies the end of options to git filter-branch
  • --all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.

You can also  remove a whole directory:

git filter-branch --prune-empty -d /dev/shm/scratch \
 --index-filter "git rm --cached -rf --ignore-unmatch dirname" \
 --tag-name-filter cat -- --all

You can check that, commits including the file have been modified and commit with only that file are removed from the log. Check using gitk or git log.

Shrink the repository

We used git-filter-branch to get rid of files from commits. People expect the resulting repository to be smaller than the original, but you need a few more steps to actually make it smaller because Git tries hard not to lose your objects until you tell it to.

  • Remove the original refs backed up by git-filter-branch (do this for all branches):
    git update-ref -d refs/original/refs/heads/master
    
  • Expire all reflogs with:
    git reflog expire --expire=now --all
  • Garbage collect all unreferenced objects with
git gc --prune=now

You are ready to push now.

git push

Push your updated tree on the git repository. Make sure you have enough rights to do so.

git push -f

Leave a Reply

Your email address will not be published. Required fields are marked *