Efficiently read deep learning code: the logic and ideas of how to get code quickly and well

I hesitated for a long time whether to single out the matter of reading code and write it as a tweet. After all, reading code, everyone may read it. Moreover, the amount of code I personally read and wrote is not enough to guide everyone to read the code. But the author still decided to write a little bit boldly: it is to set a standard for reading code for myself in the future, and to write down some methods that have not been practiced before for your reference. If it is inappropriate, please advise.

For those who engage in deep learning, the two must-haves in daily life are reading codes in addition to reading papers . For the method of reading papers, please refer to the previous article, but how to read and what skills are used to read everyone has different opinions, so I won’t say much here. Reading code is somewhat similar to reading papers and reading books, after all, they are all in the nature of reading. But then again, reading code is very different from reading paper. Code is a process of implementing the models and algorithms in the paper, and it is a process that requires your thinking to be online all the time. And because what we read is the code of the deep learning project, not a huge book like the Linux kernel code, so the nature of the code is somewhat different.

The code of a deep learning project mentioned by the author here must be read in different ways, ranging from a test demo with a few hundred lines to an open source project with tens of millions of lines. The following figure shows the comparison between the Mask R-CNN project code and the PyTorch source code:

 

It can be seen that Mask R-CNN is a classic instance segmentation framework, and its code volume can be said to be quite satisfactory. The 3k code volume can be almost spent a few days after we finish reading the paper. However, the source code of PyTorch is not so friendly to most people. The amount of code is 750k, and the underlying C++ code accounts for as much as half of the entire project. As far as deep learning research is concerned, it can be called a giant level. . This amount of code is like a beginner getting a copy of PRML, and often puts a lot of energy into it and then nothing happens. Therefore, for these two types of project codes, the reading method will definitely be different.

Because the purpose, scene and object of our code reading are different, the author will discuss with you how to read a deep learning project code from three aspects.

Some common ways to read code

Some general ways to read the code first. This point is not limited to deep learning project code, any project, code reading in any language is applicable. Our daily code reading is nothing more than two tools. One is to download the code to the local, open IDLE, and read quietly in IDLE:

GitHub's web-side direct reading

Read directly on GitHub's web side, but GitHub does not set the reading directory on the left like the editor. Every time you enter a code file, you must exit to enter another file, and the user experience is extremely poor. Of course, none of this is a problem. Chrome provides us with auxiliary reading plug-ins such as Octotree, which can be searched and installed directly in the Chrome extension.

After installation, we can have the experience of IDLE reading code directly on the web side:

You can see that there is an IDLE-like directory bar on the left side of the page, which greatly facilitates us to view and read the project code. This is the tool level in the general method, let's look at some basic reading rules. With IDLE and Octotree tools, our first point must be to carefully look at the code directory to have an overall understanding of the code structure and distribution of the entire project. For deep learning, the modules in the directory are usually relatively fixed, such as the models directory The code for model building and training is placed under the conifg directory, some configuration files of the model are placed under the conifg directory, and the data information used by the project is placed under the data directory. The directory structure of the semantic segmentation project is as follows:

After you are familiar with the structure of the deep learning project code, you will naturally become familiar with it after reading a lot, and the reading efficiency will also increase later.

The second general method is to quickly find the readme document . Generally speaking, the readme document in the root directory contains the usage method of this code, which contains key information that can let you quickly understand the project. Generally speaking, the author in the readme of an open source project will describe how to use the code and deploy it. The following picture is the readme document of DenseNet:

For large-scale projects, there may be readme documents in each subdirectory. This is the part we need to read carefully. The author puts all the key information in it. So regardless of this, reading the readme at the first time is a necessary step and a general method for you to understand the project.

The third general method is how to read specifically. That is, we have to determine a main line of reading . This point is a general method for deep learning project codes. For a deep learning project, the most key points we generally want to understand are nothing more than data, models, and how to train them. If you want to quickly see the test results of this open source project, just read the readme to see how to use it. If the author of this project proposes a new model framework, such as bert, and you want to know the details of its model framework, directly locate the .py file with the word model in the models directory and start reading. Or you want to see how this project is trained, what training tricks are used, how its parameters are initialized, how big the batch size is, how the learning rate is adjusted during the training process, etc., so I won’t say much , directly locate the .py file with train. The three training files of faster-rcnn are shown in the figure below.

According to the purpose, whether it is the main line of the model or the train, other branches will definitely be involved in the reading process, such as other branches such as data and configuration. Continuously improve your understanding of branches during the mainline reading process, and over time, a complete project will be digested by you.

The above are some general methods for reading codes of deep learning projects. Let's talk about the code reading of the two scenarios in detail. After all, everyone pays attention to purpose in doing things, and often does something with a strong purpose, and the efficiency is generally particularly high.

The first scene is when everyone encounters problems while doing research and projects . I don't know how to solve this problem, and when Google can't find a suitable method directly. At this time, we may hope to search on GitHub. For example, we want to know how to weight the loss function when the data is extremely imbalanced, or how to find the best classification threshold for model prediction for multi-label problems, etc. These problems are all likely to be encountered when we are doing actual projects. In this case, if you can find a solution to a similar scenario on GitHub, I believe you will be refreshed instantly.

The following keras-based CNN multi-label classification project uses matthews_corrcoef to determine the best classification prediction threshold for multi-label classification threshold optimization. As for what is matthews_corrcoef, these are the places you need to learn and absorb in the process of solving problems. In short, purposefully reading the code of a certain project is often only reading a certain block or even a few key lines. The number is not large, and it can solve your problem.

The second scenario is to read code for self-improvement . The so-called personal refinement is the key to a person with a large amount of disposable personal study and research time, a high degree of self-discipline, and a strong learning ability to make a leap in ability. **Although the author occasionally goes to GitHub to read some codes, it is far from enough to reach the level of personal improvement. For example, the previous PyTorch 750k source code, with such a large amount of code, the reading strategy must be the idea of ​​​​divide and conquer, scattered encirclement and individual destruction. Decomposing the project, setting reading plans and goals, it is still possible to complete it with super execution. This is not something ordinary people do, but I believe that those who can improve in the field of deep learning will not be ordinary people.

What should I do if the downloaded code fails to run?

The downloaded code rarely runs through the first run, and most of them will report errors due to problems such as the local environment and package version. At this time, one way is to check whether your own version conforms to the environment provided by the source code one by one; the second way is to ask the author directly, or directly search in the existing code issues. Some classic source codes have already been published. The predecessors put forward the pits and solutions they stepped on. Both of these are more troublesome.

The recent popularity of chatgpt and gpt4 makes it easier to fix bugs. For example, the following example, I believe everyone is familiar with it, GPT4 not only gives a solution, but also explains why this error occurs

 GPT-4 is a large autoregressive language model that can understand and generate human language, even computer programming language, by learning a large amount of training data. Its performance in code bug fixing is mainly due to model training on a large number of code bases and related documents. If you are interested, you can use https://gpt4test.com to test it out. You can try it in China, no need to go over the wall, and the site is stable. For more complex bugs, you can use the following prompt to ask gpt questions:

Based on the following code (copy code), there is a bug (copy bug), please help me to correct it based on the original code

GPT can not only find out the cause of the bug, but also provide the modified code, which can be directly copied and used. I believe that after you try it, you will gradually program for gpt. It's really fragrant!

It is true that whether it is reading textbooks, reading papers or reading codes mentioned in this article, these in themselves are a process of improving personal learning ability and acquiring knowledge. For us who are engaged in deep learning, there are endless papers on arxiv and codes on GitHub. The key is to keep learning and be a lifelong learner .

 In order to help you better read papers and understand cutting-edge information, we have compiled a hard-core dry goods: the industry's first AI full-stack manual , which contains up to 3,000 pages, covering the development of large language model technology, the latest trends and applications of AIGC technology , deep learning technology and other AI directions.

The WeChat public account follows "Xi Xiaoyao Technology Talk", and replies "789" to download materials.
[picture]

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/131604291