[How to improve development efficiency] Explore Google's code management

Date: July 2,

2016 Both Google and Facebook have only one code repository, and the code of the whole company is placed in this repository.

I've been confused, why do I do this, and what are the benefits of putting projects in different languages ​​in one library?

The latest issue of ACM Communications (Volume 59, Issue 7) has a paper, "Why is Google putting billions of lines of code in one library? ", the author is an engineer of Google's infrastructure team, which can be regarded as the official detailed answer to this question. After reading it, I feel that I have learned a lot, and the following is an excerpt.



1.
Overview Google first used CVS for code management, which was changed to Perforce in 1999. It was a Perforce host, plus various cache machines.

At that time, the code of the whole company was in a warehouse, and it has been used since then. Due to the growing scale, Perforce could no longer meet the demand, and Google began to use its own version management system, Piper.

Piper is built on Google's own distributed database system (formerly Bigtable, now renamed Spanner) and distributed in 10 data centers around the world, ensuring that Google employees around the world have good access speeds.

At present, this code repository contains 1 billion files, 35 million submission records, 86TB in size, and has tens of thousands of users. There are 500,000 requests per second on weekdays and 800,000 at peak, mostly from automated build and test systems.

More than 90% of Google's code is placed in Piper. For those projects that are open source and require external collaboration, the code is placed in Git, mainly the Android project and the Chrome project. The characteristic of Git is that all history is copied to the user's local machine, so it is not suitable for large projects and must be split into smaller repositories. Taking Android as an example, the project contains more than 800 independent repositories.

Second, the design of Piper
2.1 structure
The entire warehouse adopts a tree-like structure. Each team has its own directory. The directory path is the namespace of the code. Each directory has an owner, who is responsible for approving file changes in that directory.

2.2 Permission Control
Piper supports file-level permission control. 99% of the code is visible to all users, with only a few important configuration files and confidential business-critical access restrictions.

If confidential information is accidentally placed on the Piper, files can be wiped out quickly. Also, all reads and writes are logged, so administrators can find out who has read the file.

2.3 Workflow
The workflow of Piper is shown in the figure below.



The developer first creates a local copy of the file, which is called a "workspace". After development is complete, a snapshot of the workspace is shared with other developers for code review. Only after passing the review can the code be merged into the central repository.

2.4 Client
Most developers access Piper through a client called CitC.

Developers can browse and synchronize files on Piper through CitC, but edit and modify them in their own workspace, which only saves changed files (generally no more than 10 files in a workspace). CitC has a cloud storage mechanism, and each workspace is a directory on the cloud. These files are not merged into Piper from Citc until they pass a code review.

Not using CitC is also allowed, all code is kept locally and eventually submitted to Piper using the Git client. However, because CitC provides more functions, the current usage rate reaches 80%.

2.5 Trunk development
Google adopts "trunk-based development". Code is generally committed to the head of the trunk. This ensures that all users see the latest version of the same code.

"Trunk development" avoids the hassle of merging branches. Google generally does not use branch development, branches are only used for release. Most of the time, a release branch is a snapshot of the trunk at some point in time. Future debugging and feature enhancements are submitted to the trunk, and cherry-pick to the release branch if necessary. Development branches that run parallel to trunk for a long time are very rare at Google.

Since it does not use "branch development", Google introduces new features, generally using switch controls in the code. This avoids another fork, and also makes it easy to switch features through configuration, and to easily switch back to the old feature if the new feature fails. Wait until the new features are stable before completely removing the old code. Google has routing algorithms similar to A/B testing to evaluate the performance of code, which is easy to implement thanks to configuration switches.
Click to read the full text: http://click.aliyun.com/m/9974/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326322419&siteId=291194637