TiDB source code reading series of articles (1) sequence

At TiDB DevCon2018, we announced the TiDB source code reading and sharing activity, and promised to publish a series of articles and videos to help everyone understand the TiDB source code. Everyone has been very concerned about the timing of this event, and we have been busy with the development of new versions and have been busy. During the Spring Festival holiday, I finally had time to start writing this series.

**Why are we doing this? **The reason for this is that with the gradual development of the TiDB project and the increasing complexity of the code, we found that it became more and more difficult for new students to start modifying the code. We came up with the idea of ​​doing internal training. By recording videos and writing tutorials, we can speed up the integration of new colleagues. After doing it a few times, we found that the effect is good. In addition to the benefits of new students, old comrades also Everyone has benefited from knowing the modules that they were not familiar with before. We thought that the open source community is facing the same problem and can benefit from this work, so we came up with the idea of ​​making this activity smaller and bigger, so we have this activity.

As an open source project, TiDB has received extensive attention from the community during the development process. Many people are trying out or using TiDB online, and have given many good suggestions or feedback to help us make the project better. This is true for project development, and the same is true for research on database technology. We very much hope to communicate with database researchers and enthusiasts. We have organized nearly 100 technical meetups or talks in the past two years. During the communication with you, we found that the domestic database technology level is very good. Sparks can always collide in the process of communication. Through this activity, we hope to have a more in-depth communication with you, and let TiDB "meet you honestly" by reading the source code.

foreword

The best way to learn a system is to read some classics and study an open source project, and databases are no exception. There are many good open source projects in the stand-alone database field. MySQL and PostgreSQL are the two most famous ones. Many people have seen the codes of these two projects. We also read a lot of MySQL and PG code when we were just doing the database, and benefited a lot from it. However, in terms of distributed databases, there are not many good open source projects. There are some well-known systems that are not open source, such as F1/Spanner, and some systems are poorly maintained or change from open source to closed source, such as closed after being acquired by Apple. The source FoundationDB (fortunately, I cloned a code at the beginning:), see here , we have also organized some open source system code reading Talks internally or externally, but they are not systematic.

TiDB has gained a lot of attention at present, especially some technology enthusiasts who hope to participate in this project. Due to the complexity of the entire system, many people do not understand the entire project well. We hope that through this series of articles, from top to bottom, from shallow to deep, we will describe the technical principles and implementation details of TiDB, and help you master this project.

background knowledge

This series of articles will focus on TiDB itself. Readers need to have some basic knowledge, including but not limited to:

  • Go language, do not need to be proficient, but at least be able to read the code, know the use of Goroutine, Channel, Sync and other components

  • Database basics, understand what functions and components a stand-alone database consists of

  • Basic knowledge of SQL, basic DDL and DML statements, basic knowledge of transactions

  • Basic knowledge of backend services, such as how to start a background process, how RPC works, some networking, and common sense of operating systems

  • In general, readers need to understand basic database knowledge and understand Go language programs. I believe this is not a problem for most students.

In addition to the above general knowledge, I also hope that readers can read the three articles I wrote before ( storage , computation , and scheduling ) to understand some basic principles of TiDB.

What can readers gain

What can you gain from this series of articles? The first is to understand the basic principles of a relational database by understanding the basic principles of TiDB; the second is to understand how a database is implemented by reading the TiDB code, and to implement the database principles seen in textbooks. Third, knowing the impact of the implementation of a database on its behavior can better understand why the database is like this, and generalize it to other databases. I believe it will also help readers make good use of other databases. Fourth, you can see how a large distributed system is designed, built, and optimized. Finally, after everyone understands the code of TiDB, if there is a need in the follow-up work, you can refer to the code of TiDB. At present, some companies have used some modules of TiDB in their own products, such as Parser.

abstract

First of all, a concept is clarified. Generally speaking, we mention that TiDB refers to the entire distributed database, including the three components of tidb-server/pd-server/tikv-server. Since the whole project is relatively complex and involves two programming languages ​​(Golang and Rust), to understand database-related things, you only need to look at the code of tidb-server. The calculation-related logic above tikv-server can also be found in the code of tidb-server. In the code directory of tidb-server, you can find a component called mock-tikv, which uses local storage to simulate the behavior of tikv-server, here You can find a lot of the same code logic as tikv-server, especially the logic of the Coprocessor module. The logic on tikv-server is transplanted from mock-tikv. Therefore, this series of articles mainly introduces the code of tidb-server. Unless otherwise specified, the TiDB mentioned in the article refers to tidb-server.

This series of articles will explain the Protocol layer and important modules such as Parser, Preprocess, Optimizer, Executor, and Storage Engine according to the components of the database and the common process of SQL processing. It is divided into two parts as a whole, the first half includes the following four articles:

  • The first article introduces the overall architecture, knowing which modules TiDB has, what they do, where to start, which ones can be ignored, and which ones need to be read carefully.

  • The second article starts from the SQL processing flow, introduces where is the entry, what operations need to be done, and knows where a SQL comes in, where it is processed, and where it is returned.

  • The third article starts from the code itself and introduces how to understand the code of a module.

  • The fourth article will introduce an example of how to make TiDB support a new syntax.

I hope that after reading this part, you will have a certain foundation for TiDB and be able to understand the general process. When you encounter problems or want to add a new feature to TiDB, you will not be at a loss.

The second half will explain more in-depth, explaining each important module of TiDB, including the detailed implementation of the optimizer, how logical optimization/physical optimization is done, the implementation of important physical operators, and so on. I hope you can have a deep understanding of TiDB after reading it, and can fully understand the code of TiDB. This part will be much more than the first half, and the exact number has not yet been determined.

This series of articles will also serve as PingCAP's internal training material, which we hope the community will benefit from as well. All articles will be published on PingCAP's WeChat public account (WeChat: pingcap2015), Zhihu column and PingCAP's official blog . You are welcome to pay attention through these channels.

Beyond the article

In addition to this series of articles, we also have an open source program for internal training videos. At present, the internal source code explanation activities have been carried out 4 times. The form is that a colleague spends a week to study a module that he is not familiar with, and then Spend an hour explaining to other colleagues. The purpose is to let everyone know about all the modules. This training will continue, with videos being recorded each time, and we plan to edit and organize these videos before releasing them. In the near future, some community contributors will be invited to do internal testing, and then make some adjustments based on their opinions, and then open it to the entire community.

time plan

This series of articles has just started to be written. At present, there is only a general plan. We will try our best to ensure that each article is released according to the plan. Several articles in the first half will be released before mid-March, and articles in the second half will be released gradually.

As for the video part, depending on the clip and the progress of the test, we will give a preview.

some expectations

We have no experience in writing a series of tutorials, and we hope that in the process of gradually releasing the articles, we can receive feedback from readers to guide us to continuously improve this work, and finally be able to do it well together. Throughout the campaign, we will pay close attention to feedback and make adjustments at any time.

In addition, we hope to have like-minded people participating in the development of TiDB, through the open source community, or even in the flesh :).

In addition, the purpose of this series of articles is to help readers better understand the TiDB source code , not to replace the process of reading the source code. I hope readers can refer to these articles when reading the source code, instead of reading only the articles and not reading the code. Remember that "on paper, you will feel shallow, and you must know that this matter requires PR".

Author: Shen Li

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324455324&siteId=291194637