[Open Source and Project Combat: Open Source Combat] 78 | Open Source Combat II (Part 1): Learning from Unix Open Source Development to Deal with Large and Complex Project Development

The difficulty of software development is nothing more than two points. One is technical difficulty, which means that the amount of code is not necessarily large, but the problem to be solved is more difficult, and some relatively deep technical solutions or algorithms are needed. People” can handle it, such as automatic driving, image recognition, high-performance message queue, etc.; the second is complexity, which means that the technology is not difficult, but the project is huge, the business is complicated, the amount of code is large, and many people participate in the development , such as logistics systems, financial systems, etc. The first point involves subdividing professional domain knowledge, and has nothing to do with the design and coding that our column will talk about, so we focus on the second point, how to deal with the complexity of software development.

Anyone can write a simple "hello world" program. Thousands of lines of code can be maintained by anyone. However, when the code exceeds tens of thousands, hundreds of thousands, or even hundreds of thousands or millions of lines, the complexity of the software will increase exponentially. In this case, we not only require the program to run correctly, but also require the code to be understandable and maintainable. In fact, the complexity is not only reflected in the code itself, but also in collaborative research and development. How to manage a large team for orderly collaborative development is also a very complicated problem.

How to deal with complex software development? The Unix open source project is an example worth studying.

Unix was born in 1969 and has been evolving until now. The amount of code is several million lines. Such a huge project development can be developed in such a perfect way, and it can be maintained for a long time to maintain sufficient code quality. There are many successful experiences that can be used for reference. . Therefore, next, we will use the development of Unix open source projects as an introduction, divide the time of three classes, and use the following three topics to talk in detail about the methodology for dealing with complex software development. I hope these experiences can be of use to you, and when you face the development of complex projects in the future, you will be able to deal with them in an orderly and orderly manner.

  • From the perspective of design principles and ideas, how to deal with the development of large and complex projects?
  • From the perspective of R&D management and development skills, how to deal with the development of large and complex projects?
  • Focusing on Code Review, how to maintain the code quality of the project through Code Review?

Without further ado, let's officially start today's study!

Encapsulation and abstraction

In Unix and Linux systems, there is a classic saying, "Everything is a file", which translates into Chinese as "Everything is a file". This sentence means that in Unix and Linux systems, many things are abstracted into the concept of "file", such as Socket, drive, hard disk, system information, etc. They use the path of the file system as a unified namespace (namespace), and use the unified read and write standard functions to access.

For example, if we want to view CPU information, in Linux system, we only need to use Vim, Gedit and other editors or the cat command to open /proc/cpuinfo like other files to view the corresponding information. In addition, we can also check the /proc/uptime file to know how long the system has been running, and check /proc/version to know the kernel version of the system, etc.
In fact, "everything is a file" embodies the design idea of ​​encapsulation and abstraction.

The access details of different types of devices are encapsulated, abstracted into a unified file access method, and higher-level codes can access different types of devices at the bottom layer based on the unified access method. The advantage of this is to isolate the complexity of the underlying device access. A unified access method can simplify the writing of upper-level code, and the code is easier to reuse.

In addition, abstraction and encapsulation can effectively control the spread of code complexity, encapsulate complexity in local code, isolate the variability of implementation, and provide a simple and unified access interface for other modules to use. Other modules are based on The code will be more stable if you program through an abstract interface rather than a specific implementation.

Layering and Modularization

We also mentioned earlier that modularity is a common means of building complex systems.
With a complex system like Unix, no one person can control all the details. The main reason why we can develop such a complex system and maintain it is to divide the system into independent modules, such as process scheduling, process communication, memory management, virtual file system, network interface and other modules. Different modules communicate through interfaces, and the coupling between modules is very small. Each small team focuses on an independent high-cohesion module for development, and finally assembles each module like building blocks to build a Super complicated system.

In addition, the reason why large-scale systems such as Unix and Linux can achieve orderly collaborative development by hundreds or thousands of people is also due to the good modularity. Different teams are responsible for the development of different modules, so that even without knowing all the details, managers can coordinate the various modules to make the entire system work effectively.

In fact, in addition to modularization, layering is also a method we often use to architect complex systems.
We often say that any problem in the computer field can be solved by adding an indirect middle layer, which itself reflects the importance of layering. For example, the Unix system is also based on layered development, which can be roughly divided into three layers, namely the kernel, system calls, and application layers. Each layer encapsulates the implementation details of the upper layer and exposes abstract interfaces to call. Moreover, any layer can be reimplemented without affecting the code of other layers.

In the face of the development of complex systems, we must be good at applying layering technology, and move the code that is easy to reuse and has little to do with the specific business to the lower layer as much as possible, and move the code that is easy to change and strongly related to the specific business to the upper layer as much as possible. to the upper floor.

Interface-based communication

We just talked about layering and modularization, so how do different layers and modules communicate? Generally speaking, it is called through the interface. When designing the interface to be exposed by a module or layer, we must learn to hide the implementation. The interface should be abstract from naming to definition, and the specific implementation details should be as little as possible.

For example, the underlying implementation of the open() file operation function provided by the Unix system is very complicated, involving permission control, concurrency control, and physical storage, but it is very simple for us to use. In addition, because the open() function is defined based on abstraction rather than concrete implementation, when we change the underlying implementation of the open() function, we do not need to change the upper-level code that depends on it.

High cohesion, loose coupling

High cohesion and loose coupling are a relatively general design idea. Code with good cohesion and less coupling can allow us to gather in a small range of modules or classes when modifying or reading code, without the need to understand There are too many codes of other modules or classes, so that our focus will not be too divergent, which reduces the difficulty of reading and modifying the code. Moreover, because the dependencies are simple and the coupling is small, modifying the code will not affect the whole body. The code changes are relatively concentrated, and the risk of introducing bugs is greatly reduced.

In fact, many of the methods just mentioned, such as encapsulation, abstraction, layering, modularization, and interface-based communication, can effectively achieve high cohesion and loose coupling of code. Conversely, the high cohesion and loose coupling of the code means that the abstraction and encapsulation are in place, the code structure is clear, the layering and modularization are reasonable, and the dependencies are simple, so the overall quality of the code will not be too bad . Even if a specific class or module is not well designed and the code quality is not very high, the scope of influence is very limited. We can focus on this module or class and do corresponding small refactorings. Compared with the adjustment of the code structure, the difficulty of such a small refactoring with a relatively concentrated range of changes is much smaller.

designed to scale

The more complicated the project is, the more time should be spent on the pre-design. Think in advance which functions may need to be expanded in the project in the future, and reserve extension points in advance so that when future requirements change, new functions can be easily added without changing the overall structure of the code.

To make the code scalable, the code needs to satisfy the principle of opening and closing. Especially for open source projects like Unix, there are more than n people participating in the development, and anyone can submit code to the code base. The code satisfies the principle of opening and closing, adding new functions based on extension rather than modification, minimizing and centralizing code changes, avoiding new codes from affecting old codes, and reducing the risk of introducing bugs.

In addition to meeting the principle of opening and closing and making the code scalable, we also mentioned many methods, such as encapsulation and abstraction, and interface-based programming. Identify the variable and immutable parts of the code, encapsulate the variable parts, isolate the changes, and provide an abstract immutable interface for use by the upper system. When the specific implementation changes, we only need to extend a new implementation based on the same abstract interface and replace the old implementation, and the code of the upstream system hardly needs to be modified.

First principles of KISS

Simple, clear, and readable are the first principles to be followed in any large-scale software development. As long as the readability is good, even if the scalability is not good, at most it will take more time and change a few lines of code. However, if the readability is not good, and you can’t even read it, then it’s not something that can be solved by spending more time. If you have a vague understanding of the logic of the existing code and modify the code with an attitude of trying, the possibility of introducing bugs will be very high.

Whether you are yourself or a team, when participating in the development of large-scale projects, try to avoid over-design and premature optimization. When there is a conflict between scalability and readability, or when there is a trade-off between the two, when it is ambiguous, you should choose Following the KISS principle, readability is preferred.

principle of least surprise

The book "The Art of Unix Programming" mentions a classic Unix design principle called "The Least Surprise Principle", which is "The Least Surprise Principle" in English. In fact, this principle is equivalent to "compliance with development specifications", which means that when designing or coding, you must abide by uniform development specifications and avoid counter-intuitive designs. In fact, on this point, we have also talked about it in the previous part of coding standards.
Following the unified coding standard, all codes are written by one person, which can effectively reduce reading interference. In large-scale software development, there are many people involved in the development. If everyone writes code according to their own coding habits, the code style of the entire project will be strange. This class is this coding style, and another class is another. style. When reading, we have to keep switching to adapt to different coding styles, and the readability becomes worse. Therefore, for the development of large-scale projects, we must pay special attention to complying with unified development specifications.

key review

Well, that's all for today's content. Let's summarize and review together, what you need to focus on.

Today, we mainly learn how to deal with complex software development from the perspective of design principles and ideas, or from the perspective of design and development. I totaled 7 points that I think are more important. We have talked about these 7 points in detail before. If you don't understand which part is not clear enough, you can go back and read it again. These 7 points are:

  • Encapsulation and abstraction
  • Layering and Modularization
  • Interface-based communication
  • High cohesion, loose coupling
  • designed to scale
  • First principles of KISS
  • principle of least surprise

Of course, these 7 points are not independent of each other, and some points support each other, such as "high cohesion, loose coupling" and abstract encapsulation, hierarchical modularization, and interface-based communication. There are several points that conflict with each other, such as the KISS principle and design for expansion, which require us to weigh according to the actual situation.

class disscussion

From the perspective of design principles and ideas, which principles or ideas do you think can work best in large-scale software development and deal with the complexity of the code most effectively?

Guess you like

Origin blog.csdn.net/qq_32907491/article/details/131365501