Is Git a must-have for programmers?

Click to follow  asynchronous books , stick to the public number

Share IT good books, technical dry goods and workplace knowledge with you every day

Participate in the topic discussion at the end of the article, and give asynchronous books every day.

- Asynchronous editor

There's a pretty amazing success story behind Git. In April 2005, Linus Torvalds set out to implement Git himself out of dissatisfaction with any open source version control system available at the time .

To this day, if we search Google for "git version control", we see millions of results. Git has become a standard for new open source projects. Many large open source projects have or are planning to migrate to Git. Is Git a must-have for programmers? The answer is yes .

Why use Git?

Standing on the shoulders of giants, we would like to thank Linus Torvalds, Junio ​​C. Hamano, and the many committers of the Git project for bringing this wonderful tool to the developer community.

  • Git allows us to work with branches: in a project with multiple developers working together in parallel, each developer will have many different development paths. The advantage of Git is that it provides a complete set of reintegration tools for the development chain, so that we can merge, rebase, and pick it.

  • Flexibility in workflow: Git is very flexible. Not only can a single developer use it, agile teams can find the right way to work with it, and even a large international project with many developers working in different locations can develop a great workflow with it.

  • Good for dedication and collaboration: Most open source projects rely on the selfless dedication of developers. Therefore, it is very important to keep this approach to selfless giving as simple as possible. And this is usually difficult to do in a centralized version control system, because we can't give everyone permission to write the repository. But if we use Git, everyone can clone an independent working repository before making subsequent changes to it.

  • High performance: Git is still very fast when dealing with long-standing projects with many files. For example, using Git to switch the current version of the Linux kernel source to an older version 6 years ago takes less than 1 minute on a MacBook Air. Considering that there were over 200,000 commits and 40,000 change files between the two versions, that's pretty impressive.

  • Strong anti-fault and anti-attack capabilities: Since the project history is scattered and stored in multiple distributed repositories, the possibility of serious data loss is unlikely. Combined with the cleverly simple data structures in the repository, this ensures that the data in it will still be interpreted correctly even in the far future. Moreover, it also uses a unified cryptographic check, which makes it difficult for attackers to tamper with the repository.

  • Offline development and multi-site development: The distributed architecture can make it very easy to develop offline or develop while traveling. Moreover, in the multi-point development mode of this structure, we neither need to set up a central server nor a fixed network connection.

  • Strong open source community: In addition to the official and detailed documentation, you can also find countless relevant manuals, forums, wikis, etc. in this community, as well as a variety of tool ecosystems, hosting platforms, publications, services, and various Plugins for development environments, the whole community is thriving.

  • Extensibility: Git provides users with many useful commands, including commands that make it easier for us to directly access their remote repositories. This makes Git very flexible, and this flexibility will allow its various standalone applications to provide more functionality than the default Git version.

Why use workflow

Git is very flexible. It can be used in many different roles, from a single sysadmin who occasionally needs to version a few shell scripts, to hundreds of developers working on a Linux kernel project. Of course, this flexibility doesn't come without a price. Before you start working with Git, you have to make a set of decisions. For example the following.

Of course, Git is already a distributed repository. But are you really going to just work locally, or would you rather have a central repository?

Git supports both push and pull data transfer types, but do we need to use both? If you had to choose, which one would you choose? Why not another one?

Branching and merging are two powerful features in Git. But how many branches should we open? Is it based on each software function? Or for each distribution? Or should there be only one branch?

To get started, let's summarize the workflow and what it does .

  • Workflow refers to the day-to-day operating procedures for related projects.

  • The workflow will give specific steps.

  • The workflow displays the necessary commands and options.

  • Workflows are great for the close teamwork that these modern software projects often result from.

Some workflows may not be the only correct solutions to the target problem, but they are a good starting point from which we can develop efficient workflows for our own projects.

The reason we focus on the work of agile development teams in commercial projects is because we believe that many professional developers (including authors) are currently working in such a work environment. Of course, large projects with special requirements are not included here, because these projects often have exaggerated workflows, and we believe these are not projects that most developers will be interested in. Also, the development of those open source projects is not included here, although these projects can also use Git to plan an interesting workflow.

Distributed version control, what's so special about it

Before discussing the concept of distributed version control in detail, let's quickly review the traditional centralized version control architecture.

Figure 1.1 shows the typical layout of a centralized version control system such as CVS or Subversion. Each developer has a working directory (ie workspace) on his or her own computer that contains all project files. After the developer has made changes locally, he or she periodically submits the changes to some central server. Then, while the developer performs the update operation, the changes made by other developers are also picked up from the server. The current and historical versions of these files (the repository) are stored on this central server. Therefore, the branches being developed in parallel, and the various named (tagged) versions will be managed centrally.

Figure 1.1 Centralized version control

In a distributed version control system (see Figure 1.2), however, there is no separation between the developer environment and the server environment. Each developer has both a workspace for current file operations and a local repository (we call this a clone) where all versions, branches, and tags of the project are stored. Each developer's changes are loaded into new commits, which are first committed to their local repository. Then, other developers will see the new version immediately. With push and pull commands, we can transfer these changes from one repository to another. In this way, from a technical point of view, all the repositories here have the same status in the distributed architecture. Therefore, in theory, we no longer need to rely on a server to transfer all the changes made on one development machine directly to another development machine. Of course, in specific practice, the server repository in Git also plays an important role, such as the following special repository.

Figure 1.2 Distributed version control

  • Project repository (blessed repository): This repository is mainly used to store versions created and released by "official".

  • Shared repository: This repository is mainly used for file exchange between people within the development team. In small projects, the project repository itself can fill this role. But under the condition of multi-point development, we may need several such dedicated version libraries.

  • Workflow repository: The workflow repository is usually only used to populate modifications that represent a specific progress status in the workflow, such as the status after approval.

  • Fork repository: This repository is mainly used to separate some content from the main development line (for example, to separate out the content that takes a long time to develop and is not suitable for completion in a normal release cycle), or to isolate That part of the development progress used for experimentation that may never be included in the mainline.

Next, let's take a look at the advantages of distributed systems over centralized ones.

  • High performance: Almost all operations can be performed directly locally without network access.

  • Efficient way of working: Developers can quickly switch between tasks through multiple local branches.

  • Offline functionality: Developers can perform commits, create branches, version tags, and more without a server connection. Then upload it to the server.

  • Flexible development process: We can build dedicated repositories for other departments in teams and companies, such as those for easy communication with testers. This makes it easy to publish related changes because it's just a push on a specific repository.

  • Backup role: Since each developer holds a copy of the repository with a complete historical version, the possibility of data loss due to server failure is minimal.

  • Maintainability: For those tough refactorings, we can try it out on a copy of the repository before delivering success to its original repository.

Repository, the foundation of distributed

In fact, the repository is essentially an efficient data storage structure, consisting of the following parts.

File (ie blob): This contains both text and binary data, which will not be saved as a filename.

目录(即Tree):目录中保存的是与文件名相关联的内容,其中也会包含其他目录。

版本(即commit):每一个版本所定义的都是相应目录的某个可恢复的状态。每当我们创建一个新的版本时,其作者、时间、注释以及其之前的版本都将会被保存下来。

对于所有的数据,它们都会被计算成一个十六进制散列值(例如像1632acb65b01 c6b621d6e1105205773931bb1a41这样的值)。这个散列值将会被用作相关对象的引用,以及日后恢复数据时所需的键值(见图1.3)。

图1.3 版本库中的对象存储


也就是说,一个提交对象的散列值实际上就是它的“版本号”,如果我们持有某一提交的散列值,就可以用它来检查对应版本是否存在于某一版本库中。如果存在,我们就可以将其恢复到当前工作区相应的目录中。如果该版本不存在,我们也可以从其他版本库中单独导入(拉回)该提交所引用的全部对象。

接下来,我们来看看采用这种散列值和这种既定的版本库结构究竟有哪些优势。

  • 高性能:通过散列值来访问数据是非常快的。

  • 冗余度——释放存储空间:相同的文件内容只需存储一次即可。

  • 分布式版本号:由于相关散列值是根据文件,作者和日期来计算的,所以版本也可以“离线”产生,不用担心将来会因此而发生版本冲突。

  • 版本库间的高效同步:当我们将某一提交从一个版本库传递给另一个版本库时,只需要传送那些目标版本库中不存在的对象即可。而正是因为有了散列值的帮助,我们才能很快地判断相关对象是否已经存在。

  • 数据完整性:由于散列值是根据数据的内容来计算的,所以我们可以随时通过Git来查看某一散列值是否与相关数据匹配。以检测该数据上可能的意外变化或恶意操作。

  • 自动重命名检测:被重命名的文件可以被自动检测到,因为根据该文件内容计算出的散列值并没有发生变化。也正因为如此,Git中并没有专用的重命名命令,只需移动命令即可。

分支的创建与合并很简单

对于大多数版本控制系统来说,分支的创建与合并通常会因其特殊性而被认为是高级拓展操作。但由于Git最初就是为了方便那些散落在世界各地的Linux内核开发者而创建的,合并多方努力的结果一直都是其面临的最大挑战之一,所以Git的设计目标之一就是要让分支的创建与合并操作变得尽可能地简单且安全。

在下面的图1.4中,我们向你展示了开发者是如何通过创建分支的方式来进行并行开发的。图中的每一个点都代表了该项目的一个版本(即commit)。而由于在Git中,我们只能对整个项目进行版本化,所以每个点同时也代表了属于同一版本的各个文件。

图1.4 因开发者的并行开发而出现的分支创建操作

如上所示,图中两位开发者的起点是同一个版本。之后两人各自做了修改,并提交了修改。这时候,对于这两位开发者各自的版本库来说,该项目已经有了两个不同的版本。也就是说,他们在这里创建了两个分支。接下来,如果其中一个开发者想要导入另一个人的修改,他/她就可以用Git来进行版本合并。如果合并成功了,Git就会创建一个合并提交,其中会包含两位开发者所做的修改。这时如果另一位开发者也取回了这一提交,两位开发者的项目就又回到了同一个版本。

在上面的例子中,分支的创建是非计划性的,其原因仅仅是两个开发者在并行开发同一个软件罢了。在Git中,我们当然也可以开启有针对性的分支,即显式地创建一个分支(见图1.5)。显式分支通常主要用于协调某一种功能性的并行开发。

图1.5 针对不同任务的显式分支

版本库在执行拉回和推送操作时,可以具体指定其针对的是哪一些分支。当然,除了这些简单的分支创建和合并处理外,我们也可以对分支执行以下动作。

  • 移植分支:我们可以直接将某一分支中的提交转移到另一个版本库中。

  • 只传送特定修改:我们可以将某一分支中的某一次或某几次提交直接复制到另一个分支中。这就是所谓的捡取处理。

  • 清理历史:我们可以对分支历史进行改造、排序和删除。这有利于为该项目建立更好的历史文档。我们称这种处理为交互式重订(interactive rebasing)。

另外,如果你是一个繁忙的项目管理者,还在犹豫不决是否要采用Git

该读什么书?

《Git高手之路》 

Jakub Narębski 著

点击封面购买纸书


本书面向所有的Git用户,全面细致地向读者介绍有关Git的各项实用技巧,充分发掘它的潜力,更好地实现项目版本管理。学习本书,可以帮助读者更好地运用Git,提升软件开发效率。

本书作者Jakub Narębski自Git诞生之初就参与了Git的开发工作。他是gitweb子系统(Git原始Web界面)的主要贡献者之一,是非官方的gitweb维护者。  


《Git学习指南》 

【德】René Preißel(普莱贝尔), Bjørn Stachmann(斯拉赫曼) 著

点击封面购买纸书


 Git 是当今最流行的版本控制系统。本书并不偏重理论介绍,也不面面俱到,而是一本学习Git的实用指南。本书首先介绍了Git 的基础知识,然后关注于敏捷开发,并给出工作流展示了解决现实问题所需的命令和选项。


《Git版本控制管理(第2版)》 

【美】Jon Loeliger , Matthew McCullough 著

点击封面购买纸书

市面上绝无仅有的Git图书 全面剖析Git的用法 同时涵盖GitHub  

本书可以让读者迅速上手Git,用它来跟踪、分支、合并和管理代码变更。本书通过一系列步骤式教程,引导读者迅速掌握从Git基础知识到高级使用技巧在内的所有知识,并提供友好而严谨的建议,以帮助读者熟悉Git的许多功能。 

本书在上一版的基础之上进行了全面更新,包含了操作树的技巧,全面覆盖了reflog和stash的用法,还全面介绍了GitHub仓库。一旦你掌握了Git系统的灵活性之后,你可以以近乎无限的各种方式来管理代码开发,而本书则会告诉你怎么来做。  


今日互动

你用过Git吗?截止时间4月29日17时,留言+转发本活动到朋友圈,小编将抽奖选出5名读者 赠送e读版100元异步社区代金券一张,(留言点赞最多的自动获得一张)。


推荐阅读

2018年4月新书书单

异步图书最全Python书单

一份程序员必备的算法书单

第一本Python神经网络编程图书



长按二维码,可以关注我们哟

每天与你分享IT好文。


在“异步图书”后台回复“关注”,即可免费获得2000门在线视频课程;推荐朋友关注根据提示获取赠书链接,免费得异步e读版图书一本。赶紧来参加哦!

扫一扫上方二维码,回复“关注”参与活动!

点击阅读原文,直接购买《Git高手指南》

阅读原文


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324897708&siteId=291194637