Summary: Read this article for 20 minutes
This article is reprinted from an article by bit WebApp architecture group, the article explains some of the documents under mysterious .git directory, git is finally explained how to store data related to the content, and git branch.
git how to store data:
1. .git folder under the folder objects stored in the file system git some objects, their names are 40 hash value,
is divided into:
Blobl object file object
tree object tree object
the commit commit object
2. You can find all the contents of the submission by the tree object corresponding to a commit
git log command to view the commit object, through the submission of an object, you can get above the tree object record
git branch:
Branch is a variable pointer to the latest branch of the commit
1. .git folder refs folder stores related files git branch,
heads the local branch folder stores, remotes folder to store the remote branch,
2. .git folder under the HEAD file pointing to the current detection branch or submit object.
text:
When we switch to a different branch, or switch to a designated historical records submitted, Git is how the historical version of a file out of tune? In other words:
- Where is the data stored in the Git?
- How to store different versions of a file?
- It is how to make different versions and submit the specified record associating?
Before starting, in order to avoid some ambiguity, unified understanding submitted under the first record: We all know that every record will have to submit a corresponding one of the only 40 strings, which is calculated using the Git SHA-1 algorithm is based on our submission out of the hash. This 40-bit string of id have called checksum or hash value or values or SHA-1. Since SHA-1 is a hashing algorithm, here we are unified under, this string is called hash value .
Git how to store data
Git how to store data, answer the main objects in the directory .git folder inside. To not explain what objects directory, let's have a look inside something. Now you can feel free to open a Git repository, find one .git directory (by default, it may be hidden) objects files in the folder. He should look like the following:
There is a lot to two characters named folder, the folder number is a number to a string of 38 characters in the file name. If you are sensitive enough, you might be able to guess it's a similar hash value specified commit the record. With
git cat-file
command to view the contents of the documents. The results are as follows:
The results presented here are the contents of my project CHANGELOG.md file. Rather, it is the entire contents of a file when a CHANGELOG.md submitted. And the objects folder, Git is the place to store data.
Git each commit, you will find the file exists changed, using the SHA-1 algorithm calculated based on the contents of the 40-bit string to file name, folder on the objects. The first two characters of the hash values for naming subdirectories, and the remaining 38 characters are used as the file name. That is, every time your file changes occur, there will be a corresponding snapshot, save all the contents of the version of the file. If you want to restore files to a Git a historical version, just to get the hash value to the file corresponding to the second version.
But how do you know what Git version of the file corresponds to when a commit hash value is it? That is how Git associated with different versions and submit documentation up?
实际上在 objects 文件夹里主要存放着三类信息,除了我们上面提到的文件内容信息,还有文件路径信息以及提交信息。它们分别以 文件对象 blob object、树对象 tree object、提交对象 commit object 的形式存在。每一个对象都是 objects 目录下的一个文件,和保存文件内容信息的方式类似,Git 会通过 SHA-1 算法根据对象内容计算哈希值,得到一串 40 位的字符串为文件命名,用于找到他们。 接下来我们来看树对象和提交对象都包含了哪些内容。
当你提交时,Git 会把你当前的目录结构保存下来,对应的 tree object 结构如下:
包含了文件的文件名和文件对象的哈希值,以及子文件夹名字和对应的树对象哈希值。也就是说,你只需要找到某次提交根目录的树对象哈希值,就能找到该次提交所有文件对应的文件名以及文件对象哈希值。 也就能得到文件的历史版本内容了。
那我如何找到这个树对象?
就是通过我们最熟悉的提交对象。git log
命令可以找到对应提交的哈希值,我们可以看到提交对象的结构如下:
tree 指向的就是当前提交的树对象。parent 是上一次提交的提交对象。其他的是作者,提交人,时间和描述信息。
现在我就可以知道某次提交的某个文件的历史版本内容是什么了。由此我们也大概了解了 Git 是怎么存储文件的历史版本,又是怎么把历史版本和提交记录联系起来的。
```
git 的分支
现在我们知道,只要拿到某次提交对象的哈希值,我们就能得到该次提交对应的快照。如果我们想要某个分支的文件内容,又是如何做到的?
分支只是一个可变的指针,指向分支最新的提交对象。
在 .git 里的 refs 文件夹中,保存了所有分支对应的最新提交对象的哈希值。目录结构如下:
cat master
可以看到,该文件中保存了最新提交对象的哈希值。如果你现在进行一次新的提交,会发现该文件中的内容会变为新提交对象的哈希值。
最后,可以用一张图来概括分支和提交对象之间的关系。