Preface
During the use Git
of , we may mistakenly submit some sensitive information (keys, personal privacy) or useless files to the remote warehouse. At this time, we need to clean up the relevant data and directly delete the sensitive information in the file before submitting it. Although there is no sensitive information in the warehouse , the corresponding sensitive information can still be seen in the submission history .
When we need to remove this sensitive information from the submission record and do not want the submission record of the entire warehouse to be lost, we can use the official git-filter-branch tool, but it is cumbersome and not fast to use.
It is recommended to use the BFG Repo-Cleaner tool here. It is Scala
written by and is specially git
made for removing submission records. It is git-filter-branch
a substitute for and the official introduction says that it is git-filter-branch
up to three 10~720
times faster than.
BFG Repo-Cleaner
usage of
Official website process introduction:
git clone --mirror git://example.com/some-big-repo.git
java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
cd some-big-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push
Details below.
1. Installation
- It requires a local
java
environment. This article will not introduce the installation. - Download
bfg.jar
, the download link provided here is 1.14.0 , you can download it from the official website yourself.
2. Basic usage
my-repo.git
Use to --mirror
clone to your local code repository. The file structure for executing the command is as follows:
my-repo.git.bfg-report
What is recorded is modified data.
- The following command will
500M
clear all files larger than in the submission history.
java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git
- Delete specified file
java -jar bfg.jar --delete-files id_dsa my-repo.git // 删除 id_dsa 文件
java -jar bfg.jar --delete-files id_{dsa,rsa} my-repo.git // 文件名为 `id_dsa` 和 `id_rsa` 的文件都会删除
- delete directory
java -jar bfg.jar --delete-folders pwd my-repo.git // 删除 pwd 目录
- Remove sensitive information
java -jar bfg.jar --replace-text replace_pwd.txt my-repo.git
Here replace_pwd.txt
we define the text that needs to be removed. We can learn and use the specific grammar rules by ourselves. The example is as follows:
PASSWORD1 # 默认删除 PASSWORD1 的相关记录
PASSWORD2==>examplePass # PASSWORD2 改为 examplePass
PASSWORD3==> # PASSWORD3 转为 空字符串
regex:password=\w+==>password= # 正则匹配替换,将password具体的数据删除 password=xxx 替换为 password=
regex:\r(\n)==>$1 # 替换 Windows 换行符 为 Unix 换行符
3. Example
bfg
Script
# 先使用 `--mirror` 将数据库克隆下来, `git --mirror` 的作用,感兴趣的可以自己去搜索。
git clone --mirror xxx/my-repo.git
# 替换 Password1:xxx 为空
java -jar bfg.jar --replace-text "replace_pwd.txt" --no-blob-protection my-repo.git
# 删除 pwd.txt 文件
java -jar bfg.jar --delete-files "pwd.txt" --no-blob-protection my-repo.git
cd my-repo.git
# 清理脏数据
git reflog expire --expire=now --all && git gc --prune=now --aggressive
# 推送至远端
git push
replace_pwd.txt
document content
regex:Password1:[\s\S]+==>
Simulated an operation of mistakenly uploading sensitive information to a remote warehouse.
add
committest.txt
andpwd.txt
file
delete
Commitpwd.txt
deletes the file and removes the value oftest.txt
in the filePassword1
.
- After executing
bfg
the script, check the two submissions again and find that all sensitive information has been removed.
add
The assignment in the submission Password1
is gone, and pwd.txt
the modification of the file cannot be seen.
delete
There are no modifications to the submission.
4. Problems encountered during use
- You need to
--mirror
clone the code repository first. All operations are based on the cloned code repository. bfg
git
Files in the repository will not be deleted , even if--delete-files
deleted using . The correct approach should be to manually delete the file, submit it to the remote warehouse, andbfg
delete only the relevant data in the submission history.- In the end
git push
, it always reports insufficient permissions (the code repository I created myself), so--no-blob-protection
parameters need to be added. Some of the permission controls here are not very clear yet. If it doesn't work, you can release the protection of branch / firsttag
, and then set it back to the original after the update is completed.
Here is an example of a branch that is not allowed to be forced to push. If you turn off the button switch in the picture, and push again, an error will be reported. --replace-text
The parameter needs to provide a replacement sample, and one line is a replacement sample. What is replaced is the entire warehouse data, and the warehouse will be scanned.- What should be done if there are
dev
andmain
branches, both of which contain some commit history that needs to be stripped of sensitive information? Becauseclone
the command uses--mirror
the flag, this push will update all references on the remote server, namely:dev
andmain
branches, including alltag
commit records of will be deleted. - The commit record will not be deleted, only the data modifications in the record will be removed.
Summarize
bfg
Main purpose: If you want to retain the submission history, you need to delete information in the submission history that you do not want others to see (keys, personal privacy, etc.) or remove large files.
After the environment configuration is completed, it is relatively convenient to use and the software runs very fast. It is recommended when encountering similar needs.
Finally, I hope that everyone will not use this tool and try to prepare the corresponding knowledge reserves ( git
its use, filtering of sensitive information, etc.) in advance to prevent unnecessary trouble.