mongodb kernel source code implementation, performance tuning, best operation and maintenance practice series - millions of lines of mongodb kernel source code reading experience sharing

About the author

       Former technical expert of Didi Chuxing, currently in charge of OPPO document database mongodb , responsible for the R&D and operation and maintenance of OPPO document database mongodb with tens of millions of peak TPS/ 10 trillion data volume , and has been focusing on distributed cache, high-performance server, Database, middleware and other related research and development. Follow-up will continue to share " MongoDB kernel source code design, performance optimization, best operation and maintenance practices", Github account address :               https://github.com/y123456yz 

Preamble

      The Mongodb kernel source code consists of the third-party library third_party and the mongodb service layer source code. The mongodb service layer code depends on different third_party libraries in different module implementations, and the third-party library is the basis for the mongodb service layer code implementation ( for example : the underlying network IO implementation depends on asio-master library , the underlying storage relies on wiredtiger storage engine library ) , and third-party libraries also depend on some other libraries ( for example: wiredtiger library depends on snappy algorithm library, asio-master depends on boost library ) .                                    

      Although there are millions of lines of Mongodb kernel source code and a huge amount of engineering, the implementation level of the mongodb service layer code is very clear, and the code directory structure, class naming, function naming, and file name naming are very clear at a glance, which fully reflects the professionalism of the 10gen team.     

      Note: The code of the mongodb kernel except the third-party library third_party is collectively referred to as the mongodb service layer code here.     

      This article takes the mongodb service layer transport implementation as an example to illustrate how to quickly read the entire mongodb code. Before reading the code, we recommend following the following guidelines.     

1. Familiar with the basic functions and usage of mongodb   

      First of all, we need to be familiar with the basic functions of mongodb, understand what mongodb is used for, and where it is used, so as to reflect the true value of mongodb . In addition, we need to build a mongodb cluster in advance to play, which can further prompt us to understand some common basic functions inside mongodb . Don’t be in a hurry. If you don’t even know what mongodb does, or you haven’t even played with the operation and maintenance methods of mongodb , it will be very inappropriate to read the code directly. Reading the code without purpose is not conducive to analyzing the entire code. , while reading the code process will be very painful.            

2. Download the code to compile the source code 

      After being familiar with the basic functions of mongodb and having a simple experience in building a cluster, we can download the source code from github , compile the source code to generate binary files, and store the compiled documents in the docs/building.md code directory. The source code compilation steps are as follows :     

1.  Download the source code of the corresponding version in the corresponding releases  

2.  Enter the For directory, refer to the contents of the docs/building.md file to install the relevant dependency tools  

3.  Execute buildscripts/scons.py to compile the corresponding binary file, or you can compile it directly with scons mongod mongos .    

4.  The production executable file after successful compilation is stored in the ./build/opt/mongo/ directory  

      During the process of compiling and running the code, I found the following two problems:

1.  The compiled binary file takes up a lot of space, as shown in the following figure: 

      As can be seen from the above figure, after processing by the strip processing tool, the size of the binary file is the same as that of the official binary package.

2. An error occurs when some low-version operating systems are running, and the corresponding stdlib library cannot be found, as shown in the following figure:

      As shown in the figure above, when the compiled binary file is copied to the online operation, it is found that it cannot be run, and the libstdc library cannot be found. The reason is that the version of the stdc library we depend on when compiling the code is higher than the version of the stdc library on other operating systems, causing incompatibility.

       Solution: When compiling, bring -static-libstdc++ to the compilation script, and compile the stdc library through the static library method instead of the dynamic library method.

3. Learn how to use the code log module, try adding printing and debugging

      Since we are not familiar with the overall implementation of the code in the early stage, and do not know the calling process of each interface, we can debug by adding log printing at this time. The log module design of Mongodb is relatively complete. From the log, it can be clearly seen which function module prints the log, and the log module has multiple printing levels.

1. Log print level setting

       In the startup parameter verbose sets the log print level. The log print level is set as follows: Mongod -f ./mongo.conf -vvvv    

The more v here, the lower the log print level is set, and the more logs will be printed. A v means that only LOG(1) logs will be output, -vv means that LOG(1) LOG(2) will write logs.

2. How to use the log module to record logs
   in a .cpp file If you need to use the log module to print logs in a new .cpp file, you need to perform the following steps:

i) Add macro definition #define MONGO_LOG_DEFAULT_COMPONENT ::mongo::logger::LogComponent::kExecutor

ii) Use LOG(N) or log() to record the log content you want to output, where N in LOG(N) represents the log print level, and the logs corresponding to log() are all recorded to the file.

      For example:  LogComponent::kExecutor represents the log related to the executor module. Refer to the log_component.cpp log module file for implementation. The corresponding log file content is as follows:

4. Learn to debug mongodb code with gdb

       Gdb is an excellent code debugging tool in the Linux system environment. It supports functions such as setting breakpoints, single-step debugging, printing variable information, and obtaining function call stack information. The gdb tool can bind a thread for thread-level debugging. Since mongodb is a multi-threaded environment, before debugging with gdb, we need to determine the thread number to be debugged. The thread number contained in the mongod process and its corresponding thread name are viewed as follows:

       Note: When debugging the mongod worker thread processing process, do not select the adaptive dynamic thread pool mode, because the thread may be destroyed due to the lack of saturation of the worker thread due to low traffic, thus causing the debugging process to be interrupted due to thread destruction. The synchronous thread mode is a Link a thread, as long as we do not close the link, the thread will always exist, and it will not affect our understanding of the mongodb service layer code implementation logic. When debugging in synchronous thread mode, you can use the mongo shell to connect the mongod server port to simulate a link, so the debugging process is relatively controllable.

       When debugging the worker thread, it was found that gdb could not find the symbol table of the mongod process, and could not debug various gdb functions, as shown in the following figure:

       The reason why the above gdb cannot attach to the specified thread for debugging is that the symbol table of the binary file cannot be loaded. This is because the -g option is not added when compiling. mongodb uses the SConstruct script to compile the scons. To enable the gdb function, you need to compile the code in scons When specifying the gdbserver option: scons --gdbserver=GDBSERVER -j 2.

       After compiling a new binary file, you can debug it with gdb. As shown in the figure below, you can easily locate the call stack information before a function, and perform debugging such as single stepping and printing variable information:

5. Familiar with code directory structure, module refinement and splitting

       Before reading the code, there is still a very important step to be familiar with the code directory and file naming implementation. The mongodb service layer code directory structure and file naming have very strict specifications. The following takes the truansport network transmission module as an example, the specific directory file structure of the transport module:

       From the above file distribution content, it can be clearly seen that the source code implementation files in the entire directory can be roughly divided into the following parts:

  1. message_compressor_* network transmission data compression submodule
  2. service_entry_point* service entry point submodule
  3. service_executor* service running submodule, that is, threading model submodule
  4. service_state_machine *Service state machine processing submodule
  5. Session* Session information submodule
  6. Ticket* data distribution sub-module
  7. transport_layer* socket processing and transport layer mode management submodules

       Through the above splitting, the implementation of the entire large transport module is divided into 7 small modules, these 7 small sub-modules are responsible for the corresponding function implementation, and each module is connected to each other to realize the overall network transmission processing process. Implementation, the following chapters will briefly describe the functions of these sub-modules.

6. Start reading the code from the main entrance

       After the first 5 steps, we are already familiar with the mongodb compilation and debugging and the implementation of the relevant code files of each sub-module of the transport module and the general function of the sub-module. At this point, we can start to read the code. The code entrances of mongos and mongod are in mongoSMain() and mongoDbMain() respectively. From these two entrances, you can understand the overall implementation of the mongodb service layer code step by step.

       Note: Don't go into the details of the implementation of the code in the early stage of the walking code. You can generally understand the implementation of the code. First, you should generally understand which sub-modules are implemented by the functions of each module in the code. Don't go into the details.

7. Summary

       This chapter mainly gives some suggestions for reading millions of mongodb kernel codes. The whole process can be summarized as follows:

  1. Understand the role and working principle of mongodb in advance.
  2. Build a cluster by yourself and learn the common operation and maintenance operations of mongodb cluster in advance, which can further help understand the functional characteristics of mongodb and improve the efficiency of later code reading.
  3. Download the source code to compile the binary executable file by yourself, learn to use the log module, and start debugging step by step by adding log printing.
  4. Learn to use the gdb code debugging tool to debug the running process of the thread, which can further promote the rapid learning of the code processing process, especially some complex logic, which can greatly improve the efficiency of walking code.
  5. Before officially reading the code, understand the code directory structure of each module in advance, split a large module into each small module, and first browse the code implementation of each module.
  6. Don't go into the details of the code in the early days. After you know the general functions of each module, you can start to go into the details step by step to understand the deep internal implementation.
  7. Start reading the code step by step from the main() entry, combined with log log printing and gdb debugging.
  8. Skip the unfamiliar module code in the overall process, and only read the implementation of the module code that you want to understand this time.

 

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324144247&siteId=291194637
Recommended