Git version control - a brief description of the principle

foreword

This is the second time I've blogged about the principles of Git. I have taken a lot of detours before, and my understanding of many problems has surfaced, so I decided to reorganize it again. In fact, a large part of the previous detour was caused by reading inferior materials, so it is strongly recommended that other domestic materials other than ProGit are only for reference.

 

distributed

There is not much to explain the concept of version control. The commonly used SVN belongs to the centralized version control system, while Git belongs to the Distributed Version Control System (DVCS). In this type of system, the client does not only extract the latest version of the file snapshot, but completely mirrors the code repository. In this way, any server used for collaborative work fails, and it can be recovered with any mirrored local warehouse afterwards. Because each clone operation is actually a complete backup of the code repository.

Furthermore, many of these systems can be specified to interact with several different remote code repositories. In this way, you can collaborate with people in different working groups on the same project. You can set up different collaborative processes as needed, such as a hierarchical model-style workflow, which was not possible in previous centralized systems.

 

Version file management

Git's approach to data version management is different from other version control systems. Other systems store version change information, like a notepad that records changes to files. Git, on the other hand, manages files by saving a "stream of snapshots", which is like putting the original file in a drawer and then making changes on a new copy.

How to store version change information:

Snapshot streaming mode:

It should be emphasized here that only modified files will generate a new snapshot in the corresponding version. So the snapshot generated by each commit is not the entire file system, but the modified part of the file. (The file outlined by the dotted line in the above figure indicates that no new snapshot is generated)

 

local execution

Git has a complete repository on each user's machine, so the work we usually do in a centralized version control system, such as committing code, comparing code, branching and merging, can be done locally, and it's extremely fast.

However, the code collaboration between different developers still needs to be exchanged through a network connection to a remote repository.

 

file status (modified and staged, committed)

There are three states of files managed by Git, namely committed, modified and staged. Commit means that the data has been safely stored in the local database. Modified means that the file has been modified but not yet saved to the database. Staged means that the current version of a modified file is marked for inclusion in the next committed snapshot.

This introduces the concept of three working areas of a Git project: the Git repository, the working directory, and the staging area.

The Git repository directory is where Git uses to keep a project's metadata and object database. This is the most important part of Git, when you clone a repository from another computer, it is the data here that is copied.

The working directory is the content extracted independently of a version of the project. These files are extracted from the compressed database of the Git repository and placed on disk for you to use or modify.

The staging area is a file that saves the list of files to be submitted next time, usually in the Git repository directory. It is sometimes called `'index'', but is generally referred to as the staging area.

The basic Git workflow is as follows:

  1. Modify the file in the working directory.

  2. Staging a file puts a snapshot of the file into the staging area.

  3. Commit the update, locate the files in the staging area, and permanently store the snapshot to the Git repository directory.

If a specific version of a file is stored in the Git directory, it is in the committed state. If modifications have been made and placed in the staging area, they are staged. If modifications have been made since the last fetch but have not been placed in the staging area, it is in the modified state.

 

tags

Like other version control systems (VCS), Git can tag a commit in the history to indicate importance. Typically people will use this feature to mark publishing nodes (v1.0 etc.).

Git uses two main types of tags: lightweight and annotated.

A lightweight tag is much like a branch that doesn't change - it's just a reference to a specific commit. (This is different from svn, the svn label will move)

However, the annotation tag is a complete object stored in the Git database. They are verifiable; contain the tagger's name, email address, date and time; a tag information; and can be signed and verified using GNU Privacy Guard (GPG). It is generally recommended to create an annotation tag so that you can have all the above information; but if you just want to use a temporary tag, or for some reason don't want to keep that information, lightweight tags are also available.

 

commit

When committing, Git saves a commit object. Knowing how Git saves data, it's natural to think that the commit object will contain a pointer to a snapshot of the staging content. But more than that, the commit object also contains the author's name and email address, the information entered at the time of the commit, and a pointer to its parent object . The commit object generated by the first commit has no parent object, the commit object generated by the normal commit operation has one parent object, and the commit object generated by merging multiple branches has multiple parent objects.

From the point of view of a single commit, a commit (commit) points to a snapshot file directory tree (tree), and the directory tree points to multiple file snapshots (blob). This way we can find all the file snapshots of this commit by committing.

From the commit flow perspective, the commit object will contain a pointer to the last commit object (the parent).

The diagram below shows a commit that is associated with both its own snapshot and its parent commit.

These patterns of commits ultimately make up Git's commit tree structure.

Each repository has a unique commit tree structure, and branches and tags are nothing more than pointers to a node.

 

branch

Almost all version control systems support branches in some form. Using branches means that you can separate your work from the main line of development so that it does not affect the main line of development. In many version control systems (svn), this is a slightly inefficient process - often requiring a complete copy of the source code directory. For large projects, such a process can take a lot of time.

Combined with the description of the commit section, we know that Git has only one commit tree. Git branches are essentially just mutable pointers to commit objects . Git's default branch name is  master. After multiple commits, you actually already have a  master branch pointing to the last commit object. It will automatically move forward on each commit operation.

How does Git create new branches ? Simple, it just creates a new pointer for you to move around.

So, how does Git know which branch it is currently on? Also simple, it has a  HEAD special pointer called . Note that it  HEAD is quite different from the concept in many other version control systems such as Subversion or CVS. In Git, it's a pointer to the current local branch  HEAD . 

To switch branches is to point a HEADpointer to another local branch.

If we create different branches from one commit, and then make new commits on those branches respectively. At this point the project will have a forked commit history , and multiple commits will point to the same parent commit. If these forks are to be eventually merged back into the original branch, we have to resolve them by merging or rebasing.

remote branch

A remote reference is a reference (pointer) to a remote repository, including branches, tags, etc.
A remote-tracking branch is a reference to the state of a remote branch. They are local references that you cannot move, they are moved automatically when you do any network communication operations. Remote tracking branches are like bookmarks for the state of those branches the last time you connected to a remote repository.
They are named in the form (remote)/(branch). For example, if you want to see the status of the master branch when you last communicated with the remote repository origin
, you can view the origin/master branch.

 

tracking branch

Checking out a local branch from a remote-tracking branch automatically creates a branch called a "tracking branch" (sometimes called an "upstream branch"). A tracking branch is a local branch that has a direct relationship to a remote branch. git pullIf you enter or on a tracking branch  git push, Git can automatically identify which server to fetch and which branch to merge into.

When a repository is cloned, it usually automatically creates a tracked  origin/master branch  master . However, you can set up other tracking branches if you wish - either tracking branches on other remote repositories, or no-tracking  master branches.

Tracking branch information is generally stored in the local level configuration file

branch.master.remote=origin
branch.master.merge=refs/heads/master
branch.dev_5.2.remote=origin
branch.dev_5.2.merge=refs/heads/dev_5.2

 

merge

When we want to bring together branches that have forked commits, we can do this with a merge operation.

Git will do a simple three-way merge using the snapshots pointed to by the ends of the two branches and the working ancestors of the two branches.

Instead of pushing the branch pointer forward, Git takes a new snapshot of the result of the three-way merge and automatically creates a new commit pointing to it. This is called a merge commit, and it's special in that it has more than one parent commit.

It should be pointed out that Git will decide which commit to choose as the best common ancestor and use it as the basis for merging; this is different from older CVS systems or Subversion (before 1.5), where these ancient versions managed In the system, users need to choose the best merge basis by themselves. This advantage of Git makes merge operations much simpler than other systems.

So we can think of a merge as taking all the snapshots from the forked commit and integrating them together, and finally making a new commit. And the new commit has multiple parent commits.

 

fast- forward merge

When you try to merge two branches, if you can go down one branch to reach the other, then Git will simply advance the pointer (move the pointer right) when merging the two, because this is the case The merge operation below has no disagreements to resolve - this is called "fast-forward".

Master can complete the merge with iss53 as long as it moves forward, so it will use fast-forward merge.

 

merge conflict

Sometimes merge operations don't go so smoothly. If you make different changes to the same part of the same file in two different branches, Git won't be able to merge them cleanly. Merge conflicts arise when merging them.

At this point Git did the merge, but did not automatically create a new merge commit. Git will pause and wait for you to resolve the merge conflicts. After you manually resolve it, Git will ask if the merge was successful. If you answer yes, Git will stage those files to show that the conflict is resolved.

If you're happy with the results, and you're sure that the conflicting files are staged, you can enter  git commit to complete the merge commit. This results in a new merge commit.

The main differences from a conflict-free merge are:

1. Conflicts need to be resolved manually and marked as resolved.

2. You need to submit a new merge commit yourself.

 

rebase

There are two main ways to integrate changes from different branches in Git: merge and  rebase.

You can pull patches and changes introduced in one branch and apply them once on top of another branch. In Git, this operation is called  rebasing . You can use rebase to move all changes committed on one branch to another, as if "replaying".

Its principle is to first find the nearest common ancestor of the two branches (that is, the current branch, the target base branch of the rebase operation), then compare the previous commits of the current branch relative to the ancestor, extract the corresponding changes and save them as temporary files, and then Point the current branch to the target base, and finally apply the modifications previously saved as a temporary file in order.

Well, the wonderful rebasing is not perfect, and there is a rule to follow to use it:

Do not rebase branches that have copies outside your repository.

If you follow this golden rule, you can't go wrong. Otherwise, the people will hate you, and your friends and family will laugh at you and spurn you.

(That is to say, do not rebase the content that has been pushed to the remote branch, otherwise it will cause trouble to others)

 

interactive rebase

Interactive rebase is mainly used to merge multiple commits into one commit.

We usually merge messy commits to make the commit tree cleaner when finishing a feature.

Interactive rebase allows us to freely select commits and re-edit commit descriptions.

 

Rebase vs. Merge

At this point, you have learned the usage of rebase and merge in practice, and you must be wondering which method is better. Before answering this question, let's take a step back and want to discuss what commit history really means.

There is an argument that the commit history of a repository is a  record of what actually happened . It is a historical document, which is valuable in itself and cannot be altered arbitrarily. From this point of view, changing the commit history is blasphemous, you are using _lie_ to cover up what actually happened. What if the commit history resulting from the merge is a mess? Since this is the case, these traces should be preserved so that future generations can refer to them.

Another view is just the opposite, they see the commit history as  something that happens during the course of the project . No one publishes the first draft of a book, and software maintenance manuals require repeated revisions to be easy to use. People who hold this point of view use tools like rebase and filter-branch to write stories, whatever makes it convenient for later readers.

Now, let's go back to the previous question, is it better to merge or rebase? Hope you understand that there is no easy answer to this. Git is a very powerful tool that allows you to do many things with your commit history, but every team and every project has different needs for this. Now that you have learned the usage of the two separately, I believe you can make a wise choice according to the actual situation.

The general principle is to only perform rebase operations on local modifications that have not been pushed or shared with others to clean up the history, and never perform rebase operations on commits that have been pushed elsewhere. In this way, you can enjoy the convenience brought by both methods.

 

Application submission (cherry-pick)

cherry-pick allows us to pick one or more existing commits and use snapshots of those commits to create new commits.

That is to say, we can extract the changes of a commit and apply them to other branches.

This feature will be very useful when dealing with production bugs. If we have a production bug while the development branch is under development, we need to create a bug branch. However, the bug branch needs to be merged into the development branch for testing, and also into the production branch to solve the problem. Obviously, this problem cannot be solved perfectly by using the branch merge method.

In the above situation, cherry-pick is suitable. After the bug branch is modified, we can cherry-pick the bug-fixing commits to the production and development branches respectively. Since the commit identifiers created with cherry-pick are consistent, the rebase operation will not cause conflicts when the production goes live, and will be merged into one commit perfectly.

 

stash

Sometimes when you've been working on one part of a project for a while, everything goes into a mess and you want to switch to another branch to do something else. The thing is, you don't want to create a commit that's half done just because you'll come back to this later. The answer to this question is  git stash commands.

The stash handles the dirty state of the working directory - that is, modified track files and staging changes - and then saves the unfinished changes to a stack, which you can reapply at any time.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324465037&siteId=291194637