What exactly is the Linux kernel? How can I study efficiently? With a mind map of the learning route

This article mainly explains what is the Linux kernel, and shows the role and functions of the Linux kernel through multiple pictures, so that readers can quickly understand what is the Linux kernel and understand the Linux kernel.

With more than 13 million lines of code, the Linux kernel is one of the largest open source projects in the world, but what is the kernel and what is it used for?

å¾ç

 

å¾ç

å¾ç

What is the kernel


The kernel is the lowest level of easily replaceable software that interfaces with the computer hardware. It is responsible for connecting all applications running in "user mode" to the physical hardware and allows processes called servers to use inter-process communication (IPC) to obtain information from each other.

The kernel should be divided into types?

Yes, that's right.

3.1 Microkernel

The microkernel only manages what it must manage: CPU, memory, and IPC. Almost everything in the computer can be seen as an accessory and can be processed in user mode. The microkernel has the advantage of portability, because as long as the operating system is still trying to access the hardware in the same way, you don't have to worry about whether you change the video card or even the operating system. Microkernels also take up very little memory and installation space, and they tend to be more secure because only certain processes run in user mode, and user mode does not have the high privileges of administrator mode.

 

å¾ç

3.1.1 Pros

  • portability
  • Small installation space
  • Small memory footprint
  • Safety

3.1.2 Cons

  • Through the driver, the hardware is more abstract
  • The hardware may respond slowly because the driver is in user mode
  • The process must wait in the queue to get information
  • Process cannot access other processes without waiting

3.2 Single core

Single cores are the opposite of micro-kernels because they not only include CPU, memory, and IPC, but also include device drivers, file system management, and system server calls. Single-core is better at accessing hardware and multitasking, because if a program needs to get information from memory or other processes running, then it has a more direct line to access the information without waiting in the queue. mission accomplished. However, this can cause problems, because the more things running in management mode, if the behavior is not normal, the more things will cause the system to crash.

 

å¾ç

3.2.1 Pros

  • More direct access to the program's hardware
  • Easier communication between processes
  • If it supports your device, it should work without additional installation
  • The process reacts faster because there is no queue waiting for processor time

3.2.2 Cons

  • Larger installation volume
  • Large memory footprint
  • Not very safe, because all operations are running in management mode

Mixed kernel

The hybrid kernel can choose what to run in user mode and what to run in management mode. Normally, device drivers and file system I/O will run in user mode, while IPC and server calls will remain in manager mode. This is the best of both worlds, but usually requires hardware manufacturers to do more work, because all the driver's responsibility is borne by them. It may also have some latency issues inherent with microkernels.

 

å¾ç

4.1 Pros

  • Developers can choose what to run in user mode and what to run in management mode
  • Smaller installation space than monolithic kernel
  • More flexible than other models

4.2 Cons

  • Suffer the same process delay as the microkernel
  • The device driver needs to be managed by the user (usually)

Where is the Linux kernel file? The kernel file in
Ubuntu is stored in the /boot folder, called vmlinux -version. The name vmlinuz comes from the unix world. As early as the 1960s, they simply called the kernel "unix", so when the kernel was first developed in the 1990s, Linux began to call the kernel "Linux".

 

å¾ç

When developing virtual memory for easier multitasking, put "vm" in front of the file to show that the kernel supports virtual memory. For a while, the Linux kernel was called vmlinux, but the kernel became too large to fit in the available boot memory, so the kernel image was compressed and the x at the end was changed to z to show that it was compressed with zlib. The same compression is not always used, it is usually replaced with LZMA or BZIP2, some kernels are simply called zImage.

The version number will be in ABC format D. B may be 2.6, C is your version, and D is your patch or patch.

 

å¾ç

There are other very important files in the /boot folder, called initrd.img-version, system.map-version, config-version. The initrd file is used as a small RAM disk for extracting and executing the actual kernel file. This system. The map file is used for memory management before the kernel is fully loaded. The configuration file tells the kernel which options and modules to load when compiling the kernel image.

Linux kernel architecture


Because the Linux kernel is monolithic, it occupies the largest space and complexity than other types of kernels. This is a design feature that caused quite a lot of controversy in the early days of Linux, and still has some of the same design flaws inherent in a single kernel.

 

å¾ç

In order to solve these shortcomings, one thing Linux kernel developers do is to make the kernel modules can be loaded and unloaded at runtime, which means you can dynamically add or remove features of the kernel. This can not only add hardware functions to the kernel, but also include modules that run server processes, such as low-level virtualization, but it can also replace the entire kernel without the need to restart the computer in some cases.


Imagine if you can upgrade to a Windows service pack without restarting...

Kernel module


What if Windows has installed all the available drivers, and you only need to open the required drivers? This is essentially what the kernel module does for Linux. Kernel modules, also known as loadable kernel modules (LKM), are essential to keep the kernel working with all hardware without consuming all available memory.

 

å¾ç

Modules usually add functions such as devices, file systems, and system calls to the basic kernel. The file extension of lkm is .ko and is usually stored in the /lib/modules directory. Due to the characteristics of the module, you can set the module to load or not load by using the menuconfig command at startup, or by editing the /boot/config file, or using the modprobe command to dynamically load and unload the module to easily customize the kernel.

Third-party and closed source modules are available in some distributions, such as Ubuntu, which may not be installed by default because the source code of these modules is not available. The developers of the software (ie nVidia, ATI, etc.) do not provide the source code, but build their own modules and compile the required .ko files for distribution. Although these modules are free like beer, they are not free like speech, so they are not included in some distributions because maintainers believe that it “contaminates” the kernel by providing non-free software.

The kernel is not magic, but it is essential for any computer to function properly. The Linux kernel is different from OS X and Windows because it contains kernel-level drivers and makes many things "out of the box". Hope you have a better understanding of how software and hardware work together and the files needed to start your computer.

Summary of Linux kernel learning experience

Opening

To learn the core, everyone has their own learning method, the benevolent sees the benevolent and the wise sees the wisdom. The following are the things I summarized during the learning process. For myself, I think it is more efficient, so I can share it with you.

This is a mind map of the learning route that I have compiled:

 

Linux kernel learning related videos, the document can be clicked: learning materials to obtain

 

My learning method

At the beginning, I think the main problem lies in whether you know or not, rather than understanding or not. The realization of a certain subsystem adopts a certain strategy and method, and all you need to do in your study is to know that there is such a thing. Then is the understanding of the described strategy or method.

According to my own learning experience, when I first started to learn the kernel, I think what I need to do is to establish a general framework of the kernel in my mind, understand the design concepts and construction ideas of each subsystem, and these concepts and ideas will be viewed from a macro perspective. Presents you with a clear context, just like the main trunk of a big tree with branches and leaves removed, at a glance; of course, it will definitely involve specific implementation methods and functions, but the functions or methods encountered at this time are located in the kernel implementation. The higher level is the main (required) functions. I have learned about these functions, what design ideas are they aimed at, what functions are implemented, and what goals are achieved. It is also true that I am familiar here. . As for the other auxiliary functions called by the main function, they are equivalent to branches, branches and leaves, so you don't have to go into it too early. At this time, the relationship between the kernel subsystem framework and the code implementation is initially established. The association is actually very simple. For example, when you see the name of a function, you remember which subsystem the function is for and what function it implements. .

I think what I want to read at this time is LKD3. This book is a general discussion, mainly from the concept, design, and large implementation methods to describe each subsystem, and the code explanation for the specific related function implementation is rarely involved ( Compared with ULK3, this book is mainly about the in-depth analysis of the specific implementation of the specific function code. Of course, you can also read it, but reading this book too early will feel very painful and boring. It is basically the implementation of the function) , Very few, but not without, which is very good, to meet our current needs, but also to prevent us from going deep into the actual code too early. And this book also gives out the matters needing attention when writing the program on some important points, which can be regarded as guiding suggestions. The main subsystems include: memory management, process management and scheduling, system calls, interrupts and exceptions, kernel synchronization, time and timer management, virtual file system, block I/O layer, devices and modules. (The order here is actually the order of the LKD3 catalog).

When I was studying, I looked at the three books crosswise. I first looked at LKD3, which was dedicated to a subsystem. The main purpose was to understand the principles and ideas of design. Of course, I would also encounter introductions to some main functions, but most of them are based on What kind of functions are accomplished by the ideas and principles introduced above? The book does not conduct an in-depth analysis of the realization of the function itself. Then look at the same subsystems on ULK3 and PLKA, but do not carefully analyze the code of the underlying specific functions, just look at it roughly, without understanding, or even look at it. Because sometimes, at a certain point in one of the books, you get stuck and you don’t understand it very well. In another book, you may encounter descriptions of the same problem from different angles, and you can’t say which one. It can make you suddenly enlightened, like a divine enlightenment. I often encounter this situation.

It's not that the implementation of some function bodies is completely ignored during the learning process. As long as you want to thoroughly understand the code implementation, no one will stop you. I slowly went deeper in the process of repeated reading. For example, file opening in VFS needs to analyze the path, and there are many details to consider (.././ and the like), but the code implementation is well understood. For another example, in CFS scheduling, the time slice allocated to the process is calculated according to the shedule latency, the number of processes in the queue and its nice value (using dynamic priority). There is no reason not to watch it. This is too important, and it is also very important. interesting.

ULK3 will also have a general introduction to design principles and ideas, which are basically located in the opening paragraph of a topic. But it is more about the specific analysis of the realization of the main functions that support this principle and idea. Also in the first paragraph, summarize the function of the function in one sentence, and then implement the function in 1, 2, 3, or a, b, c steps. To explain in the form of. I just looked at it selectively, and sometimes compared to the source code opened with source insight, to confirm that the code is basically implemented according to the steps described in the book, which is to increase perceptual awareness. Since the steps are mixed with various safety and effectiveness checks for different realization purposes, skip it if you don't understand them. This does not prevent you from having an overall grasp of the realization of the function body.

PLKA is between LKD3 and ULK3. I think the author of PLKA (looking at the photo, really a handsome German guy, with such skillful skills) must have seen ULK, regardless of his original intention or intention, in short, PLKA is different from ULK, and the detailed explanation of the function will be supplemented. Remove the corners and corners of the function body, such as the handling of some special cases, validity checks, etc., without hindering the understanding of the function of the entire function body. He has explained all these and made declarations; and, just like LKD3 Similarly, instructive programming suggestions are also given at certain points. The authors even focus on different explanations of the same main function. In this way, for those of us who study, it will help deepen our understanding. In addition, I think the very important point is the 2.6.24 kernel version targeted by PLKA, while ULK is 2.6.11, and LKD3 is 2.6.34. In some respects PLKA is closer to a modern realization. In fact, the reason why the authors choose 11 or 24 respectively is because in the version release tree, these two versions have made significant changes in some aspects, or they are a landmark turning point (most of these information are in In the introduction part of the book, I can’t remember the specific details).

Intel V3, for X86 CPUs, this book is naturally an authority on system programming. Part of the implementation of the kernel can be found in this book. Therefore, when reading a subsystem of the above three books, don't forget that you can find some basic supporting information in the corresponding chapters of V3.

In the process of reading, there will be quite a lot of questions, which is undoubtedly. It's too big to understand a design idea, but it's too small to understand the purpose of a certain line of code. In all aspects, all kinds of questions, you can completely record what you don’t understand (however, I didn’t do this, I didn’t write down all the questions, I only marked a few issues that I think are critical), Wrote it on a piece of paper, no, a book, I am sure that there will be so many questions, otherwise the kernel-related forums could be closed long ago. In fact, most of the problems (many of which are questions about whether you know that there is such a thing) can be solved easily. As long as you are willing to look back and read the book a hundred times, the meaning will come to you. Read it a few more times, and it is clear that there is no problem with the connection between the front and the back. I did the same, and I watched it several times for certain subsystems, and experienced it firsthand.

When you learn these subsystems in order, the previous chapters are likely to quote later chapters. As the author of PLKA said, it is impossible to have no backward references at all. All he can do is to minimize this. Quote without compromising your understanding of the current issue. Don't understand, it doesn't matter, just skip it. The following chapters will also have references to the previous chapters, but this question is simpler. You can go back and read the corresponding introduction. Things that you didn't understand at the time are likely to know the purpose of its design at this time. And specific applications. The lack of understanding is only temporary. For example, the interaction and reference between the various subsystems of the kernel are reflected in the code to implement function interleaving calls, such as the memory allocation and release functions you learned in the memory management chapter, and you understand the memory first. When you learn drivers or modules, you will encounter these function calls, so it is easier to accept and not too at a loss; for another example, you understand the management of system time and timers, and then look back at the bottom of interrupts and exceptions. With the realization of half scheduling, your understanding of it will deepen.

Subsystem management needs a lot of data structures. One way of interaction between subsystems is that the main data structures of each subsystem refer to each other through pointer members. In the learning process, the reference book will explain the purpose of the main members in the data structure when explaining a certain subsystem, but it will certainly not cover all (the case with more members, such as task_struct), and other subsystems are based on a certain The reference to the function implementation may or may not be explained, and it may be said where this variable will be further explained. So, don't worry about a point that you don't understand, let it go for the time being, you can still see it later. The connections can be established after understanding each subsystem. In fact, I am still emphasizing the importance of understanding concepts and frameworks first.

After we have completed the step of establishing the framework, we can select a more interesting subsystem, such as a driver, a network, or a file system. At this time, you can go deeper to understand the underlying code implementation. It is easier than delving into the code at the beginning, and if you encounter some incomprehensions, or forget the implementation of a certain aspect, you can find the corresponding sub The system, because you know where to find and fill in the gaps, you not only have completed the study of the current function, but you can also review and review the previous content. This is the time for the integration.

"In-depth understanding of Linux virtual memory" (2.4 kernel version), LDD3, "In-depth understanding of Linux network technology insider", almost every subsystem needs the capacity of a book to explain, so it is not appropriate to start learning a certain module Too in-depth, wait until you have a good understanding of each subsystem, and then learn a specific subsystem in a targeted manner. At this time, the invocation of other systems can make us no longer feel at a loss, complicated, and incomprehensible.

For example, the following chapters in LDD3: constructing and running modules, concurrency and race conditions, time, delay and delay operations, allocating memory, interrupt handling, etc., are all supporting subsystems of driver development, although the book covers these subsystems There is a chapter dedicated to explaining, but how can the level of detail be comparable to the three books of PLKA, ULK3, and LKD3. After reading these three books, you will find that reading these chapters of LDD3 is almost like drinking boiled water, too casual Yes, because the explanation of LDD3 is more sketchy than that of LKD3. After laying a good foundation, PCI, USB, TTY drivers, block device drivers, network card drivers, things that need to be understood and learned are more targeted. These subsystems are general-purpose subsystems. After understanding, the development of subsystems based on these subsystems—drivers (need to further target hardware characteristics) and networks (need to further understand various protocols)—relatively speaking, the learning difficulty is greatly reduced , The learning progress is greatly accelerated, and the learning efficiency is greatly improved. Easier said than done. The prerequisite to achieve such an effect is: you must calm down, read carefully, and see through. PLKA and ULK3 are as thick as bricks, which is daunting. If there is no interest, no enthusiasm, no Perseverance, no matter what, because it takes time, it takes a long time. I don't mean that we must lay a good foundation for driver development. I just say that development will be easier and more efficient when the foundation is laid, and the ability to control the kernel code will be stronger. This is just my personal opinion, my own way of learning, for reference only.

API impression

"Compared to knowing the importance of the technology you use, it is not important to be an expert in a particular field. Knowing a specific API call has no benefit at all. Just check it when you need it." This sentence The words originated from a translated blog I saw. What I want to emphasize is that this sentence is more appropriate for application programming, but the kernel API is not exactly the case.

The kernel is quite complex, and it is not easy to learn, but when you learn to a certain level, you will find that if you plan to write kernel code, you will still pay attention to the API interface at the end, but most of these APIs are cross-platform , Meet portability. Kernel hackers have basically standardized and documented these interfaces, and all you have to do is call them. Of course, when using it, it is best to be familiar with the coding conventions in the kernel on the topic of portability, so that you can write portable code. Just like the application, you can use the dynamic library API provided by the developer, or use the open source API. The same is calling the API, but the difference is that there is a lot more to know about using the kernel API than using the application API.

When you understand the implementation of the operating system-these implementations are all basic support for the application-when you write the application, the multi-threading, timer, synchronization lock mechanism and so on used in the application Wait, when using the shared library API, contact the operating system, so as to combine the documentation description of the API with the corresponding supportive implementation of these aspects that you know about in the kernel. This will guide you to choose which to use. An API interface to select the most efficient implementation method. If you have a good understanding of system programming, it is beneficial to application programming, and it can even be said to be very beneficial.

The essence of design and realization, know or understand

The operating system is an interface between the underlying hardware and application software, and the realization of its various subsystems largely depends on the characteristics of the hardware. When the book introduced the design and implementation of these subsystems, we read it, and we know it. If we think about it further, why the overall architecture should be organized in this way, and why the local functions should be processed in this way. Of course, know the reason, if you know that a certain function is implemented because the chip is designed like this, and the CPU does this, then your question basically ends here. Further study is the design and implementation of chip architecture. For programmers, whether they are system or application programmers, many questions have been solved by exploring the footprints, because the nature of our work is soft, and these things are really Hard enough.

For example, the implementation of interrupts and exceptions explained in ULK3, the root cause is that the Intel x86 series is designed in this way, go to the corresponding chapters in the Intel V3 manual, you can find comments for the code implementation described in ULK3 . There is also time and timer management. You can also get enough information in Intel V3's introduction to APIC. The operating system is based on these hardware characteristics to implement software method definitions.

It's that sentence again. It's not a question of understanding or not, but a question of knowing or not knowing. Sometimes, if you know it, you will understand. Throughout the learning process, knowing, understanding, knowing, understanding, knowing... is repeated. Why is knowing at the beginning and end, but understanding is only an intermediate step? Everything in the world has its own laws. Human beings are only discovering. Practice is the first. Practice is the process of knowing. Practice produces experience. The summary of experience is theory. Theory comes from practice, and theory needs to be understood. We studied the kernel, studied it in depth, and went back and forth, and returned to the chip. The chip is material, and the function of the chip is based on the physical and electronic properties of matter in nature. Tracing back to the source, this is also what it means.

Write code

It's always shallow on paper, and I absolutely know that I have to do it personally. Just reading a book is absolutely impossible, you must type the code yourself in accordance with the programming suggestions given in the textbook. At the beginning, test it in the form of a module, or compile a development version of the kernel by yourself. For a machine, use UML to debug, and the kernel controls which step you go to. Single-step debugging to see the program execution process is more intuitive and clear than the explanation in the book. Be sure to get your hands dirty.

It is better to have no books than nothing.      

end

The power of interest is endless. Interest can bring passion. If work can be combined with interest, then work will have enthusiasm. Then work is not just work, it is a kind of enjoyment.

Linux, my interest, my motivation, my direction, my future!

 

Guess you like

Origin blog.csdn.net/Linuxhus/article/details/114294447