1024 Programmer's Day | I am self-developed database in Tencent, and I speak for technology

With the development of the Internet, the name "programmer" has gradually attracted attention, and the label impressions it represents have become more diverse-change the world? Debug professional user? Plaid shirt spokesperson? …In fact, there is a group of people, such as Tencent database engineers, who define themselves as "digital craftsmen" who use codes to create products, solve problems, and help the development of domestic databases. For us, with their efforts, the digital world is no longer illusory data, but is as deep and steady as an iceberg hanging on the sea.

 

On the occasion of the "1024" Programmer's Day, we launched this special sharing session, inviting 6 database technical engineers from Tencent to talk about their understanding of code technology:

01

"I am Lei Hailin, head of Tencent Cloud database technology. I joined Tencent after graduating from university in 2007. I have been responsible for the development of various underlying billing and payment modules, including the distributed Cache system'Hold (Hold)', etc., and Tencent Finance Level security and controllable distributed database research and development. I am at Tencent, helping the development of database localization." 

"I am Lai Zheng, a senior engineer of Tencent Cloud Database. I joined Tencent in 2018. I used to work in the official MySQL database team and now I am responsible for Tencent Cloud database kernel development. I work at Tencent and contribute to the development of database localization."

 

“I’m Chen Furong, an expert engineer in Tencent’s IEG database. I joined Tencent in 2011 and have worked on Tendb Cluster and Tendis projects. Now the Tencent IEGCROS team is responsible for the development of Tencent’s game cloud storage. I’m at Tencent and contribute to the localization of databases.

 

"I am Chen Zaini, a senior engineer of Tencent Cloud database. I joined Tencent in 2019 and I am engaged in the development of data multi-activity, Oracle compatibility, read-write separation and other projects. I am at Tencent and contribute to the development of database localization."

 

"I am Zhang Fengxiao, a database engineer at Tencent Cloud. I joined Tencent in 2019 and I am engaged in the design and development of multi-source synchronization and data verification modules. I am at Tencent and help the development of domestic databases."

"I am Chen Songwei, a senior engineer of Tencent Cloud Database. I joined Tencent in 2018 and I am engaged in the development of cloud database kernel. I have developed functions including enterprise-level column encryption functions, data recovery tools, asynchronous auditing, and data preheating. I am at Tencent and I am domestically produced. Helping database development."

02

1 Why did you choose to engage in the underlying database development?

Lei Hailin: Personal interests prefer to deal with computers and solve problems through code. Generally speaking, the lower the level of system software, the greater the technical challenges, and the database field is optimized for performance, high availability, and scalability. There have always been unlimited possibilities in terms of data consistency and data consistency. Various attempts and innovative explorations can be made technically, while driving a broader technological ecological innovation breakthrough.

Lei Hailin

 

Lai Zheng: As the basic system software, the database system is the core of many application systems. It involves a wide range of knowledge fields, including operating systems, transaction systems, concurrent processing, etc., which can be said to be the jewel of the software field and the crystallization of human wisdom. Being able to engage in R&D in this field will have a sense of mission, and at the same time, if you make a little achievement, it will also bring you great satisfaction. Especially in the era of data explosion, data storage and management technology has increasingly become a key technology in the computer field. It is also a blessing to be able to fight in such a rapidly changing tide.

 

Chen Furong: It should be said that I was relatively lucky. I was engaged in database learning and research at the graduate level. My first job was to follow my tutor to build a domestic database and accumulated some experience. Later I joined the DBA team of Tencent Games. Therefore, it has been more than 10 years since I was a student, and I have been doing database-related development work. I am also very fortunate to have been doing what I like better.

 

Chen Zaini: In the first back-end development of the business system, the underlying data storage used the database, and found that many large pieces of business logic code can be done with a single SQL. This made me have a strong interest in the database and began to enter this field. After entering, I found that the bottom layer of the database is indeed a complex thing, which is particularly challenging to do, but once it is completed, it will also make me feel more sense of accomplishment, which also drives me to continue working.

 

Zhang Fengxiao: I just started to learn about technology by writing Java Web. I participated in the development of small projects in the school for more than a year. At first, I felt a little boring just to use the framework and not deep enough. Later, through the community, I came into contact with many new technologies that I hadn't touched before, and became interested in the underlying implementation of some basic components. Later, during my internship at Tencent, I investigated some new DB technologies and became more interested in databases. Later, during the interview, I also expressed my desire to do a database with the interviewer, but I didn't expect to meet it.

 

Chen Songwei: Database is one of the three major system software, involving many modules, which is a very in-depth field worth exploring. It is a great honor to be engaged in the research and development of the underlying database. Especially in our team, there are many technical experts who have been in the field of database kernels for many years. They can always know that they can help me without saying anything and make me grow quickly. This makes me even more convinced that the choice to engage in database kernel research and development is correct. of.

2 As a programmer, what are the three most fulfilling businesses or things you have ever done?

Lei Hailin: I think the most fulfilling thing is the technical pursuit of perfecting some components or products, solving difficult bugs, or every surpass and upgrade in performance:

a) For example, encapsulating zkapi can make it more convenient for everyone to use, shield some details that are difficult to deal with, realize a basically unlocked memory pool component, solve occasional glitch problems, etc.;

b) For example, it takes more than a week to construct billions of requests to solve a data consistency BUG that is difficult to reproduce;

c) Responsible for the research and development of Tencent's domestic distributed database, supporting the demand for distributed databases in various industries.

 

Lai Zheng: a) A transparent encryption function is implemented in the InnoDB storage engine;

b) A spatial index based on the R tree is implemented in the InnoDB storage engine;

c) By optimizing hotspot updates, the system performance in the spike scenario has been greatly improved.

Lai Zheng

 

Chen Furong: As a programmer, the most fulfilling thing is to do some core functions or optimizations that can really land and play a role in the business. Give three examples:

The first is the ability to add fields online in the database. This should be the first relatively large function I personally joined Tencent to do. I need to optimize the underlying storage format of Innodb. When it was done, it was technically challenging, but after it was completed, it reduced the benefits of business suspension time. It is also particularly obvious. When the first Demo was made and the first business (which should have been Dou Ares God) was launched, there was still a sense of accomplishment.

The second is the research and development of TenDBCluster, a distributed solution for mutual entertainment, which solves the problem that the original database cannot be expanded horizontally and conforms to the outbreak of the mobile game era. Around 2015, when the first business switched from a stand-alone database to TenDB Cluster, another colleague and I stuck to two or three in the morning, and finally the business switched smoothly. Although it was late, I still felt very excited.

The third is the commercialization of TendisX hot and cold hybrid storage in Tencent Cloud, which is open to the outside world through Tencent Cloud.

Chen Furong

 

Chen Zaini: a) Database multi-center and multi-active module research and development: to ensure the high availability of the enterprise database, and play an important role for the customer's business system to achieve 7×24 hours of uninterrupted, efficient and stable operation;

b) The development of Oracle compatible features helps the smooth launch of Oracle compatible versions of database product functions, which greatly enhances the advantage of Tencent Cloud Database in helping industry technology localization;

c) Complete Tencent's self-developed distributed HTAP domestic database open source.

 

Zhang Fengxiao: a) One is a data verification module that implements heterogeneous data migration and synchronization in database migration, and solves the consistency verification problem in data migration.

b) The function of database heterogeneous multi-source synchronization is further improved, and the usability of the product is improved.

c) The most important thing is to teach girls to write computer homework during college, and finally become girlfriends.

 

Chen Songwei: a) Originally created the industry's only data recovery tool, which can recover user data from damaged table files to ensure data security.

b) Designed and implemented asynchronous auditing, reducing the audit performance impact to 3%, which is far ahead in the industry.

c) Originally created the data preheating function of the standby machine before the database master-slave switch.

 

3. What do you think are the most important non-technical elements for programmers?


Lei Hailin: The spirit of seeking roots. For example, there must be a rare abnormal phenomenon in the program, it must be a problem at the code level. We must do our best to find and solve it, and it cannot be ignored because it is very accidental.

 

Lai Zheng: Stay curious.

 

Chen Furong: First, the spirit of inquisition is needed. For a technical problem, if the problem is one's own main work, or a critical path to a problem to be solved, these problems must be completely clarified. As far as the underlying technology is concerned, if you study more underlying issues, you will find that the solutions to these issues are actually similar. Gradually you will establish your own methodology, so don't let a problem or bug lightly pass.

Second, be responsible for the final result. Not just satisfied with the completion of the function development, this is particularly important. A task must not only hope that this function will run, but also hope that it can really solve business problems and pain points. It is not enough if developers just position themselves as code writers.

Third, you can have a little code cleanliness, which will allow you to write better style code.

 

Chen Zaini: Be rigorous, just like the world of computers is either 0 or 1; the attitude is serious and responsible and trustworthy.

Chen Zaini

 

Zhang Fengxiao: I think it is the love and pursuit of things that interest me. The technology is the same as in other aspects. It must be interested and pursued to do better.

 

Chen Songwei: I think so, there must be an "empty cup mentality."

4. What special abilities do great programmers have?

Lei Hailin: a) Learning ability. The individual is very small, so he must continue to study with an open mind, read books, read articles and essays, and master more principles of nature;

b) Love to read good code in the open source community, and improve your programming ability by learning other people's code;

c) Have confidence in oneself, do not compromise when encountering problems, demand high standards, and like to solve various technical challenges in the work field.

 

Chen Furong: The first is learning ability. The development of new technologies in this field is endless. If you do not have a good learning ability, it is easy to have some powerless states. Of course, if you have a good background in basic computer theory, you can learn by analogy.

The second is stress resistance. Online bugs are inevitable. If there is an online failure, the pressure must be great. But the most important thing at this time is to give priority to business recovery. Therefore, we must withstand the pressure, maintain a clear thinking, and find the most efficient solution.

The third is mental adjustment ability. Powerful programmers will show full energy and energy. In addition to their own love of work, they also need to adjust their mentality, and appropriate ways to relieve stress, including exercise.

 

Chen Zaini: a) Geek spirit: Be curious about unknown technology and continue to learn;

b) The problem can be viewed through the appearance to the root of the problem;

c) Interesting soul: The code comments make people feel like a morning breeze, such as letting the module run:

/* Do the modulemagic dance */PG_MODULE_MAGIC;

Lai Zheng: With careful thinking and strong logic. 

Zhang Fengxiao: Attentive, meticulous, meticulous, comprehensive thinking; and an attitude to the roots of problems, in-depth study.

Chen Songwei: Strong logic, strong creativity, rigorous thinking and good communication skills.

5. What are your views or suggestions on the future development trend of databases?

Lei Hailin: The database will definitely continue to develop in the direction of a distributed database. On the whole, it will continue to develop in the direction of database elastic expansion, cross-regional distributed scheduling, 6 9s (99.9999%) or more availability, HTAP integration, SQL intelligent diagnosis and optimization, and extreme performance-and finally return to the database Essence: When a business obtains the domain name address of a database, the database is a black box to provide SQL read and write services with extreme performance. No need to worry about the details of capacity, SQL tuning, disaster tolerance, etc.

 

Lai Zheng: Database operations will develop in the direction of cloud computing in the future, and cloud native databases will become the mainstream. Features such as elastic expansion, TP+AP, and massive data will fully demonstrate the advantages of cloud native databases.

 

Chen Furong: 1) Distributed. The future database must mainly use a distributed architecture. Whether it is share nothing or share disk, it can better solve the problem of database capacity and facilitate flexible expansion and contraction;

2) Combination of soft and hard. In the future, the database will combine the design concept of software and hardware integration to give full play to the performance of the hardware to meet the needs of enterprise-level users, such as faster response time, higher security, larger capacity, and lower cost;

3) Intelligent. In the future, the database will combine the operating status of the database and the capabilities of AI to improve the intelligent level of database management, including fault diagnosis, fault prediction, automatic expansion and contraction, and better execution plans.

 

Chen Zaini: In the future, the database will definitely be improved based on new hardware. For example, traditional databases are designed based on unreliable storage and poor storage performance (WAL, REDO, UNDO, DO, CHECKPOINT), and are based on the current CPU computing speed. Designed (32-bit transaction ID, 64-bit transaction ID), but many theories will be completely subverted with the rapid development of hardware: for example, based on cloud-native databases (close to a design based on storage reliability), cloud-native memory databases (all data Stored in memory, it mainly solves network problems (RDMA, DPDK, SPDK, etc.), quantum databases, etc. These are the future of databases based on new hardware and new theories.

 

Zhang Fengxiao

Zhang Fengxiao: On the one hand, the current industry research on database architecture and storage structure has been a lot and relatively complete, and changes in storage media and other hardware may bring great changes to the storage architecture design. , So we can pay more attention to some changes brought about by the emergence of new storage products.

On the other hand, I agree with the viewpoint that performance is not the only muscle. Stability, productization, and operating system are the most prominent challenges faced by domestic databases. For the database to develop well, it is essential to develop an excellent ecosystem and do a good job in productization.

 

Chen Songwei: The future trend of databases is cloud native. In the future industrial Internet, the database's elastic expansion capabilities, self-diagnosis and rapid operation and maintenance capabilities, and personalized service capabilities will be very important.

 

Chen Songwei

 

6. Recommend a short code snippet worthy of praise or recommendation?

Lei Hailin: The list.h component of the Linux kernel is simple to implement and has good versatility:

/* * Insert a new entry between two known consecutive entries. * * This is only for internal list manipulation where we know * the prev/next entries already! */#ifndef CONFIG_DEBUG_LISTstatic inline void __list_add(struct list_head *new,            struct list_head *prev,            struct list_head *next){  next->prev = new;  new->next = next;  new->prev = prev;  prev->next = new;}#elseextern void __list_add(struct list_head *new,            struct list_head *prev,            struct list_head *next);#endif
/** * list_add - add a new entry * @new: new entry to be added * @head: list head to add it after * * Insert a new entry after the specified head. * This is good for implementing stacks. */static inline void list_add(struct list_head *new, struct list_head *head){  __list_add(new, head, head->next);}

Lai Zheng: The byte order conversion function code in the storage engine is simple, clear, and very sophisticated.

 

Chen Zaini:

/* * TransactionIdPrecedes --- is id1 logically < id2? */boolTransactionIdPrecedes(TransactionId id1, TransactionId id2){  /*   * If either ID is a permanent XID then we can just do unsigned   * comparison.  If both are normal, do a modulo-2^32 comparison.   */  int32    diff;
  if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))    return (id1 < id2);
  diff = (int32) (id1 - id2);  return (diff < 0);}

At first glance, this code looks unremarkable, but in conjunction with the PG transaction loop, the algorithm is designed very cleverly, and it happens to be effective under the PG transaction loop. Under the limitation of half a transaction loop, it cleverly runs the 2's complement code to realize the comparison of unsigned transaction IDs:

 

Zhang Fengxiao: The simplest database in the world can be implemented with only 2 bash functions. This is intercepted in the book "DesigningData-IntensiveApplication", and I highly recommend you to read it.

#!/bin/bashdb_set () {   echo"$1,$2" >> database}

 

db_get () {   grep"^$1," database | sed -e "s/^$1,//" | tail -n 1}

Chen Songwei: I would like to recommend a page validation function implemented in a storage engine: btr_validate_level. This function lists all possible verification conditions one by one, including page status verification in the area, page itself verification, adjacent page pointer verification, parent-child node pointer verification, parent-child node adjacent node pointer ring verification Test and so on. The code is very rigorous, which inspired me a lot.

 

Chen Furong: Short code snippets are not very meaningful. No matter how good a snippet is, it is not a serviceable carrier. Compared with code fragments, the design and implementation of a complete system is more important. If it is C language, redis is recommended; if it is C++, LevelDB or RocksDB is recommended. If you look at the mature open source software code carefully, the personal improvement is still great.

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/109252233