This article will help you fully master Git skills and knowledge!

Simply put, what kind of system is Git? Please note that the following content is very important. If you understand the ideas and basic working principles of Git, you will know how to use it and you will be able to use it with ease. When learning Git, please try to clarify your existing knowledge of other version management systems, such as CVS, Subversion or Perforce. This will help you avoid confusion when using the tools. Although Git is very similar to other version control systems in use, it is very different in the way it stores and perceives information. Understanding these differences will help avoid confusion during use.

picture

Git initializes the code repository

After executing the git init command, what exactly did you do?

After executing the following command, we can get the content shown in the figure below. On the right is the code repository created for us by Git, which contains the content required for version management.

 
 

# Execute
$ mkdir git-demo on the left
$ cd git-demo && git init
$ rm -rf .git/hooks/*.sample

# Execute on the right
$ watch -n 1 -d find .

picture

Here we can take a look at the structure of the generated .git directory:

 
 

➜ tree .git
.git
├── HEAD
├── config
├── description
├── hooks
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

.git/config - the local configuration file of the current code repository

  • Local configuration file (.git/config) and global configuration file (~/.gitconfig)

  • By executing the following command, the user configuration can be recorded in the configuration file of the local code repository.

  • git config user.name "demo"

  • git config user.email "[email protected]"

 
 

➜ cat .git/config
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
    ignorecase = true
    precomposeunicode = true

[user]
    name = demo
    email = [email protected]

.git/objects - The storage location of the current code repository code

  • blob type

  • commit type

  • tree type

 
 

# 均无内容
➜ ll .git/objects
total 0
drwxr-xr-x  2 escape  staff    64B Nov 23 20:39 info
drwxr-xr-x  2 escape  staff    64B Nov 23 20:39 pack

➜ ll .git/objects/info
➜ ll .git/objects/pack

.git/info - Exclusions and other information for the current warehouse

 
 

➜ cat ./.git/info/exclude
# git ls-files --others --exclude-from=.git/info/exclude
# Lines that start with '#' are comments.
# For a project mostly in C, the following would be a good set of
# exclude patterns (uncomment them if you want to use them):
# *.[oa]
# *~

.git/hooks - the default hook script for the current code repository

 
 

./.git/hooks/commit-msg.sample
./.git/hooks/pre-rebase.sample
./.git/hooks/pre-commit.sample
./.git/hooks/applypatch-msg.sample
./.git/hooks/fsmonitor-watchman.sample
./.git/hooks/pre-receive.sample
./.git/hooks/prepare-commit-msg.sample
./.git/hooks/post-update.sample
./.git/hooks/pre-merge-commit.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-push.sample
./.git/hooks/update.sample

.git/HEAD - the branch pointer of the current code repository

 
 

➜ cat .git/HEAD
ref: refs/heads/master

.git/refs - the head pointer of the current repository

 
 

# 均无内容
➜ ll .git/refs
total 0
drwxr-xr-x  2 escape  staff    64B Nov 23 20:39 heads
drwxr-xr-x  2 escape  staff    64B Nov 23 20:39 tags

➜ ll .git/refs/heads
➜ ll .git/refs/tags

.git/description - description information of the current code repository

 
 

➜ cat .git/description
Unnamed repository; edit this file 'description' to name the repository.

what happened after add

After executing the git add command, what exactly did you do?

After executing the following command, we can get the content shown in the figure below. We found that a new file has been added to the right, but the content in the Git directory has not changed at all. This is because the modifications we perform now are placed in the workspace by default, and the modifications in the workspace are not managed by the Git directory.

When we execute the git status command, Git can recognize that a new file has been added to the workspace. How to do this? —— See [Understanding blob objects and SHA1] for details

When we execute the git add command to let Git help us manage files, we find that a directory and two files have been added to the right, namely the 8d directory, index and 0e41.. files.

 
 

# Execute on the left
$ echo "hello git" > helle.txt
$ git status
$ git add hello.txt

# Execute on the right
$ watch -n 1 -d find .

picture

picture

Let’s focus here on the generated 8d directory and the files below. The origin of its name is because Git performs a Hash algorithm called SHA1 on it, which is used to turn the file content or string into such a string of encrypted characters.

 
 

# Check the file type of objects
$ git cat-file -t 8d0e41
blob

# Check the file contents of objects
$ git cat-file -p 8d0e41
hello git

# Check the file size of objects
$ git cat-file -s 8d0e41
10

# Assemble
the blobs 10\0hello git

Now we know that by executing the git add command to add files from the workspace to the staging area, Git will help us generate some Git objects, which store the content and file type of the file and do not store the file name.

In order to verify our above statement, we can add the same content to another file and then commit it to observe the changes in the .git directory. We found that there are no new directories and files in the objects directory on the right. This can prove that the blob type object only stores the contents of the file. If the contents of the two files are consistent, only one object needs to be stored.

By the way, why doesn't the object here store the file name? This is because the SHA1 Hash algorithm does not include the file name when calculating the hash, so it doesn't matter what name it is. Then the question arises, where is the file name information stored? ——See the [Understanding blob objects and SHA1] section for details

 
 

# Execute on the left
$ echo "hello git" > tmp.txt
$ git add tmp.txt

# Execute on the right
$ watch -n 1 -d find .

picture

Understanding blob objects and SHA1

Understand the relationship and corresponding calculation between Git's blob object and SHA1!

The Hash algorithm is to change the input of any length into a fixed-length output through the hash algorithm. Depending on the algorithm, the generated length is also different.

Hash algorithm:

  • MD5 - 128bit - Unsafe - File Checksum

  • SHA1 - 160bit(40 bits) - Insecure - Git Storage

  • SHA256 - 256bit- Secure - Docker Image

  • SHA512 - 512bit - secure

However, when we use tools to calculate the SHA1 of the above file content, we will find that it is not what we see in the .git directory. Why is this?

 
 

➜ echo "hello git" | shasum
d6a96ae3b442218a91512b9e1c57b9578b487a0b  -

This is because the calculation method of the Git tool is calculated using the type length\0 content. Here, we calculated that the file content only has nine digits, but here it is ten digits, which is caused by the presence of newlines in the content. Now we can use the git cat-file command to assemble the complete contents of the Git tool store.

 
 

➜ ls -lh hello.txt
-rw-r--r-- 1 escape staff 10B Nov 23 21:12 hello.txt

➜ echo "blob 10\0hello git" | shasum
8d0e41234f24b6da002d962a26c2495ea16a425f -

# Assemble
blob 10\ 0hello git

picture

When we use the cat command to view the contents of the object object, we find that it looks like a string of garbled characters. In fact, this is the Git tool that compresses the original content of the file and then stores it in the object object. Strangely, we found that the compressed content was actually larger than the original content!

This is because it is compressed and stores some compression-related information. The example above shows a larger file than the original because the content we created is simply too small. When we see a relatively large file, we will see that the compressed file size is much smaller than the original file.

 
 

➜ cat .git/objects/8d/0e41234f24b6da002d962a26c2495ea16a425f
xKOR04`HWH,6A%

➜ ls -lh .git/objects/8d/0e41234f24b6da002d962a26c2495ea16a425f
-r--r--r--  1 escape  staff    26B Nov 23 21:36 .git/objects/8d/0e41234f24b6da002d962a26c2495ea16a425f

➜ file .git/objects/8d/0e41234f24b6da002d962a26c2495ea16a425f
.git/objects/8d/0e41234f24b6da002d962a26c2495ea16a425f: VAX COFF executable not stripped - version 16694

In fact, we can also obtain the contents of the binary object object through Python code here.

 
 

import zlib

contents = open('0e41234f24b6da002d962a26c2495ea16a425f', 'rb').read()
zlib.decompress(contents)

picture

Talk about workspace and staging area

Let’s talk about the workspace and staging area, and how files are synchronized between the workspace and cache area.

We also talked about it in the previous chapters. When we execute the git status command, how does the Git tool know that we have a file that is not tracked, and where is the file name information stored?

The answer to all this starts with the work area and index area. According to the different states it stores, Git divides the "space" of the corresponding state into three categories: work area, temporary storage area (also called index area) and version area. For a specific example, refer to the figure below.

picture

For a deeper understanding, it is necessary to generate related object objects after executing the git add command, but it stores the class content, size and content of the file, and does not contain the information of the file name. The information related to the file name is included in the generated index file (index file).

When we directly check the contents of the index file, we find garbled characters that we cannot understand, but through the basic output, we can see its file name. To view the contents of the index file, you can view it through the relevant commands provided by Git.

 
 

# Execute on the left
$ echo "file1" > file1.txt
$ git add file1.txt
$ cat .git/index

$ git ls-files # List the file list information of the current staging area
$ git ls-files -s # List Detailed information of the current staging area file

# Execute
$ watch -n 1 -d tree .git on the right

picture

When adding a file, the file or directory will flow from the workspace to the temporary storage area, and some other operations will cause a certain difference between the workspace and the temporary storage area. This will lead to the difference between the two when we execute git status.

After the following operations, the contents of the workspace and the temporary storage area will be inconsistent. We can also check the difference through commands. When we use the add command to add a new file to the temporary storage area, we will find that it is consistent.

 
 

# Execute
$ git status on the left
$ echo "file2" > file2.txt
$ git ls-files -s
$ git status
$ git add file2.txt
$ git ls-files -s
$ git status

# Execute on the right
$ watch -n 1 - dtree.git

picture

If we modify a file here, it is obvious that our work area and temporary storage area are inconsistent at this time. When we use the command to check the file status, we find that a file has been modified. How does Git know? Ahem, it is obtained by searching the content of the index file, finding the corresponding file name and the object object referenced internally, and comparing it with the file content in the workspace.

 
 

# Execute
$ git ls-files -s on the left
$ echo "file.txt" > file1.txt
$ git status

# Execute
$ watch -n 1 -d tree .git on the right

picture

At this time, if we use the git add command to save the modified content to the temporary storage area, we will find that the reference value of the blob object of the object corresponding to the file has changed. At this time, you can find that there are three objects under the objects directory, two of which are file1.txt, but there are only two files. Check the contents of the corresponding blob object through the command and find that they are different.

 
 

# Execute
$ git ls-files -s on the left
$ git add file1.txt
$ git ls-files -s

# Execute
$ watch -n 1 -d tree .git on the right

picture

Understand the principle of commit submission

After executing the git commit command, what exactly did you do?

The commit record in the Git repository is a snapshot of all the files in your directory, just like copying the entire directory and then pasting it, but it is much more elegant than copying and pasting! Git wants commit records to be as lightweight as possible, so it doesn't blindly copy the entire directory every time you make a commit. When conditions permit, it will compare the current version with the previous version in the repository, and package all the differences together as a commit record. Git also keeps a history of commits. This is why most commit records have parent nodes above them.

When we use the add command to submit the workspace to the temporary storage area, the temporary storage area actually saves a state of the current file, including which directories and files exist, and their corresponding size and content. But we ultimately need to submit it to the code repository (local), and the command is git commit.

picture

And when we execute the git commit command, what exactly happens? You can see that after submission, two information objects are generated in the .git directory, and new files are generated in the logs and refs directories. Through the following operations, we can view the submitted type and corresponding content.

 
 

# Execute on the left
$ git commit -m "1st commit"

$ git cat-file -t 6e4a700 # View the type of the commit object
$ git cat-file -p 6e4a700 # View the content of the commit object

$ git cat-file -t 64d6ef5 # View Type of tree object
$ git cat-file -p 64d6ef5 # View the contents of the tree object

# Execute on the right
$ watch -n 1 -d tree .git

picture

In this way we understand that when we execute the git commit command, a commit object and a tree object will be generated. The content of the commit object contains a tree object and related submission information, and the tree object contains the file status (file name and blob object) in the version we submitted this time, so that we know the changes of this submission.

picture

After we submitted this time, in addition to the changes in the objects directory, there were some other changes. For example, the directories of logs and refs have changed. We check the content in the refs directory and find that it points to the commit object 6e4a70, that is, the latest commit on the current master branch is this 6e4a70.

The commit object 6e4a70 has a HEAD pointer, which is the HEAD file in the .git directory. Its essence is a pointer, which always points to the branch we are currently working on, that is, here we are working on the master branch. When we switch branches, the pointer to this file will also change randomly.

 
 

# Execute
$ cat .git/refs/heads/master on the left
$ cat .git/HEAD

# Execute
$ watch -n 1 -d tree .git on the right

picture

Deepen your understanding of commit Submit

After executing the git commit command, what exactly did you do?

When we changed, added, and submitted the content of the file2.txt file again, we found that when we submitted the content of the commit object, it contained the commit information of the parent node. For understanding, you can take a look at the submission flow chart below.

 
 

# Execute on the left
$ echo "file2.txt" > file2.txt
$ git status
$ git add file2.txt
$ git ls-files -s
$ git cat-file -p 0ac9638
$ git commit -m "2nd commit"
$ git cat -file -p bab53ff
$ git cat-file -p 2f07720

# Execute on the right
$ watch -n 1 -d tree .git

picture

picture

In Git, empty folders are not included in the tracking range, and adding folders does not add object objects. When we look at the index content, we will find that the file name contains a relative path.

When we submit through the commit command, we will find that three object objects are generated. Because the commit operation does not generate blob objects, they are one commit object and two tree objects. It can be found that the tree object contains a tree containing a directory, which contains the contents of the object file.

The file status shown in the figure below can help you understand the concept of versions in Git. That is, the commit object points to the root (tree) of the file directory tree in this version, and then the tree points to the blob object (file) and the tree object (directory), so that it can be repeated infinitely to form a complete version.

 
 

# Execute on the left
$ mkdir floder1
$ echo "file3" > floder1/file3.txt
$ git add floder1
$ git ls-files -s
$ git commit -m "3rd commit"
$ git cat-file -p 1711e01
$ git cat-file -p 9ab67f8

# Execute on the right
$ watch -n 1 -d tree .git

picture

File life cycle status

To summarize, the file status in Git and how to switch it.

Now, we have a basic understanding of how files track and synchronize their status between the workspace, staging area, and code repository. In Git operations, what are the possible states of files and how to switch states? Let’s summarize them here!

picture

picture

The meaning of Branch and HEAD

After executing the git branch command, what exactly did you do?

What exactly is a branch? What about branch switching? By looking at the official documentation of Git, we can get that a branch is a named (master/dev) pointer to the commit object.

When we initialize the warehouse, the provider will assign us a branch called master by default (in the latest version, the default warehouse has been changed to main), and the master branch points to the latest commit. Why do we need to name the branch? Just to make it easier for us to use and remember, it can be simply understood that the alias command has the same meaning.

picture

With the above foundation, we need to consider how branches are implemented and work. To implement a branch, we basically need to solve two problems. The first is to store the commit pointed to by each branch. The second problem is to help us identify the current branch when switching branches.

In Git, it has a very special HEAD file. The HEAD file is a pointer, and one of its characteristics is that it always points to the latest commit object of the current branch. And this HEAD file just solves the two problems we raised above.

When we switch branches from master to dev, the HEAD file will also switch immediately, pointing to the dev pointer. The design is so beautiful, he is indeed a genius and a good brain.

picture

 
 

# Execute on the left
$ cat .git/HEAD
$ cat .git/refs/heads/master
$ git cat-file -t 1711e01

# Execute on the right
$ glo = git log

picture

The logic behind branch operations

After executing the git branch command, what exactly did you do?

Here we can see that after branch switching, the HEAD pointer has changed.

 
 

# Execute on the left
$ git branch
$ git branch dev
$ ll .git/refs/heads
$ cat .git/refs/heads/master
$ cat .git/refs/heads/dev
$ cat .git/HEAD
$ git checkout dev
$ cat .git/HEAD

# Execute
$glo = git log on the right

picture

What needs to be noted here is that even if we delete the branch, some unique objects on the branch will not be deleted. These objects are actually what we commonly call garbage objects. There are also garbage objects generated by us using the add command many times. How to clear and recycle these garbage objects? Later, we will get involved.

 
 

# Execute on the left
$ echo "dev" > dev.txt
$ git add dev.txt
$ git commit -m "1st commit from dev branch"
$ git checkout master
$ git branch -d dev
$ git branch -D dev
$ git cat- file -t 861832c
$ git cat-file -p 861832c
$ git cat-file -p 680f6e9
$ git cat-file -p 38f8e88

# Execute on the right
$ glo = git log

picture

checkout and commit operations

Let’s talk about the checkout and commit operations!

When we execute the checkout command, it can not only switch branches, but also switch to the specified commit, that is, the HEAD file will point to a certain commit object. In Git, the phenomenon that the HEAD file does not point to the master is called detached HEAD.

Regardless of whether the HEAD file points to a branch name or a commit object, the essence is the same, because the branch name also points to a commit object.

picture

 
 

# Execute
$ git checkout 6e4a700 on the left
$ git log

# Execute
$ glo = git log on the right

picture

When we switch to the specified commit, if we need to continue modifying the code submission on the corresponding commit, we can use the switch command mentioned in the above picture to create a new branch and then submit it. However, usually we don't play around with it and use the checkout command to create new branches.

 
 

$ git checkout -b tmp
$ git log

Even if this is possible, we rarely use it. Remember the dev branch we created in the previous chapter? We created the branch and had a new commit, but deleted it without merging it into the master branch. If you use the log command to view it now, you can't see it anymore.

Actually, can you really not see it? Everyone should remember that any operation in Git, such as deletion of branches. It only deletes the pointer reference pointing to a specific commit, but the commit itself will not be deleted, that is, the commit of the dev branch is still there.

So how do we find this commit? After finding it, we can continue working on it, or find previous file data, etc.

the first method:

  • [It’s not a good idea to work hard, so I’ll try my best]

  • Under the objects directory, look at them one by one, and then switch over.

The second method:

  • [Recommended operation method]

  • Use the git reflog dedicated command provided by Git to find it.

  • The purpose of this command is to record all our previous operations.

 
 

# Execute
$ git reflog on the left
$ git checkout 9fb7a14
$ git checkout -b dev

# Execute
$ glo = git log on the right

picture

picture

Let’s talk about the execution logic of diff

After we execute the diff command, how do they compare the Git logic?

In this section, we use the warehouse in the previous section. After modifying the file content, let's see what the diff command outputs. Let's take a look here, research and study!

 
 

$ echo "hello" > file1.txt
$ git diff
$ git cat-file -p 42d9955
$ git cat-file -p ce01362

# The principle of the following commands is the same
$ git diff --cached
$ git diff HEAD

picture

How to add a remote repository in Git

How to associate our local warehouse with the warehouse on the remote server?

Initialize warehouse

 
 

$ git init
$ git add README.md
$ git commit -m "first commit"

Associated with remote warehouse

When we use the above command to associate with the remote server repository, our local .git directory will also change. If you view the .git/config file through the command, you can see that the [remote] field appears in the configuration file.

 
 

#Associate remote warehouse
$ git remote add origin [email protected]:escapelife/git-demo.git

 
 

➜ cat .git/config
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
    ignorecase = true
    precomposeunicode = true

[remote "origin"]
    url = [email protected]:escapelife/git-demo.git
    fetch = +refs/heads/*:refs/remotes/origin/*

Push local branch

When we execute the following command, the local master branch is pushed to the master branch of the remote origin warehouse. After that, when we log in to GitHub, we can see the pushed files and directory contents.

When pushing branch content, the number of pushed objects will be listed, their contents will be compressed, and then pushed to our remote GitHub repository, and a remote master branch (origin repository) will be created.

 
 

# Push local branch
$ git push -u origin master

After pushing, we can find that the local .git generated some files and directories. What are they? As shown below, four directories and two files will be added, all of which are information from the remote warehouse. When we view the contents of the master file through the command, we will find that it is also a commit object. At this time, it is consistent with what our local master branch points to. It is used to represent the current version of the remote warehouse and is used to distinguish and proofread it from the local one.

 
 

T TREE .git e
─ LOGS
│ & nbsp; & nbsp; ├ ─ head
│ & nbsp; & nbsp; └
─ & nbsp;
& nbsp; │
& nbsp; & nbsp; │ ├── master
│   │ └── tmp
│ └── remotes # Add directory
│   └── origin # Add directory
│   └── master # Add file
└── refs
    ├── heads
    │   ├── dev
    │   ├── master
    │   └── tmp
    ├── remotes # Add directory
    │   └─ ─ origin # Add new directory
    │   └── master # Add new file
    └── tags

Remote warehouse storage code

Use GitLab to understand how the remote warehouse server stores our code!

After we finish writing the code, we submit it to the corresponding remote server. Its storage structure is exactly the same as our address. If we think about it carefully, it would be strange to say otherwise.

Git is originally a code distribution platform with no central node, that is, each node is a master node, so its storage directory structure is consistent. In this way, no matter which node's content is lost or missing, we can find it through other nodes. The Git server is just a node that can help us find it in real time.

Guess you like

Origin blog.csdn.net/liuxing__jacker/article/details/132024301