Linux operating system study notes (1) start

Preface

  The Linux operating system kernel is the foundation of server learning, and it is also an important part of improving programming ability, source code reading ability and advanced knowledge learning ability. This article will start to record the source code learning process of each part of the Linux operating system.

  Regarding how to learn source code, I personally think that we can start from the following perspectives to effectively improve the efficiency of reading and learning. (Let’s learn the language, this is the basic skill. Learn IDE recommend Source Insight or Visual Studio)

  • Understand the organization of the code. Take the Linux source code as an example. First, you have to know which parts of the operating system are divided into, what functions they do individually, and how to cooperate to complete more specific functions. Establishing an overall impression is helpful for easy understanding during subsequent in-depth study. After all, the code is not used for viewing, and understanding its role is conducive to understanding why it is done.
  • In-depth study of each module .

    • Module interface : Microsoft's drawing tool visio or mind map xmind is recommended here. With its drawing, you can list the interfaces of each module and draw the relationship between each module. By understanding the interface, you can clarify the relationship between each module, namely Draw module organization chart

    • Work flow : Get the relationship between the modules through the above step, then actually use breakpoints or log to see the overall work flow, and draw the program flow chart based on the module organization chart

    • Module glue layer : Many of our codes are used to glue code, such as middleware, Promises mode, callbacks, proxy delegation, dependency injection, etc. The glue technology between these code modules is very important, because they will split the originally flat code and make it difficult for you to see their relationship. These can be used as a supplement to the program flow chart, so that the areas that cannot be connected smoothly become unobstructed.

    • Module implementation : This is the most rare place, involving a lot of specific source code learning. In-depth details are easy to get lost in the ocean of details, so some key points need to be paid attention to, and non-key content is omitted. Learning to draw module specific architecture diagrams and module algorithm timing diagrams can help you better grasp the essence of the source code.

      Things to pay attention to include

      • Code logic . The code has two kinds of logic, one is business logic, which is the real business processing logic; the other is control logic, which is only transferred by control programs, not business logic. For example: control variables such as flag, multithreaded code, asynchronous control code, remote communication code, object serialization and deserialization code, etc. You have to separate these two logics. The reason why a lot of code is messy is to mix these two logics together.
      • Important algorithm . Generally speaking, there are many important algorithms in our code. What I am talking about is not necessarily a sorting or search algorithm. It may be some other core algorithms, such as some index table algorithms, global unique ID algorithms , Information recommendation algorithms, statistical algorithms, read-through algorithms (such as Gossip), etc. These core algorithms may be very difficult to read, but they are often the most technical part.
      • Low-level interaction . Some codes interact with the underlying system, generally speaking with the operating system or JVM. Therefore, reading these codes usually requires certain underlying technical knowledge, otherwise, it is difficult to read.

      What can be ignored includes

      • Error handling . According to the twenty-eight principle, 20% of the code is normal logic, and 80% of the code is dealing with various errors. Therefore, when you read the code, you can completely delete all the code that handles errors, which will leave a comparison. Clean and simple code with normal logic. Eliminate interference factors, you can read the code more efficiently.
      • Data processing . As long as you observe carefully, you will find that a lot of our code is there to dump data. Such as DAO, DTO, such as JSON, XML, these codes are tedious, not the main logic, can be ignored.

  Having said a lot of nonsense, here is the official journey of in-depth learning and recording of the operating system.

Chaos Begins

  This article analyzes the entire process from pressing the power button to loading the BIOS and subsequent bootloader. Just like Pangu pioneered the world, this process divides the chaotic operating system world into a clear kernel state and user state, and undergoes a change from real mode to protected mode. Here is a brief introduction to the rankings to facilitate subsequent understanding.

  • Real Mode : Also known as Real Address Mode, in this mode the address accesses the location of the real memory address. In this mode, a 20-bit (1MB) address space can be used, and the software can operate all address spaces and IO devices without restrictions.
  • Protected Mode (Protected Mode) : Also known as Protected Virtual Address Mode, the memory is protected by mechanisms such as virtual memory and pages. Compared with the real mode, it is safer and more reliable, and it also increases flexibility and scalability.

From power on to BIOS

  When we press the power button, the motherboard will send a signal to the power pack. After receiving the signal, the power supply will provide a suitable voltage to the computer. When the motherboard receives the signal that the power supply starts normally, the motherboard will start the CPU. The CPU resets all register data and sets the initialization data. This initialization data is as follows in the X86 architecture:

IP          0xfff0
CS selector 0xf000
CS base     0xffff0000
  • IP/EIP (Instruction Pointer): Instruction Pointer Register, which records the offset address of the instruction to be executed in the code segment
  • CS (Code Segment Register): Code Segment Register, which points to the area of ​​the CPU currently executing code in the memory (defines the starting address of the memory where the code is stored)

  Real mode uses memory segments to manage the 1M memory space of 0-0xFFFFF: Since there are only 16-bit registers, the maximum address can only be expressed as 0xFFFFF (64KB), so the memory has to be divided into 64KB segments to make full use of 1M space. That is, as shown above, the notation of segment selector + offset is adopted. This method is also used in the design of pages in the protected mode, which can be described as ancestral wisdom. The specific calculation formula is as follows:

PhysicalAddress = Segment Selector * 16 + Offset

  This part is completed by hardware, and 0XFFFF0 is accessed through calculation. If there is no executable code in this location, the computer cannot start. If so, execute this part of the code. This is the beginning of our story, the BIOS program.

BIOS to BootLoader

  The BIOS execution program is stored in ROM, and the starting position is 0XFFFF0. When CS:IP points to this position, BIOS starts to execute. BIOS mainly includes the following memory mapping:

0x00000000 - 0x000003FF - Real Mode Interrupt Vector Table
0x00000400 - 0x000004FF - BIOS Data Area
0x00000500 - 0x00007BFF - Unused
0x00007C00 - 0x00007DFF - Our Bootloader
0x00007E00 - 0x0009FFFF - Unused
0x000A0000 - 0x000BFFFF - Video RAM (VRAM) Memory
0x000B0000 - 0x000B7777 - Monochrome Video Memory
0x000B8000 - 0x000BFFFF - Color Video Memory
0x000C0000 - 0x000C7FFF - Video ROM BIOS
0x000C8000 - 0x000EFFFF - BIOS Shadow Area
0x000F0000 - 0x000FFFFF - System BIOS

  The most important ones are the interrupt vector table and interrupt service routine. The BIOS program uses 1 KB of memory space (0x00000~0x003FF) to construct the interrupt vector table at the beginning of the memory (0x00000), and uses 256 bytes of memory space to construct the BIOS data area (0x00400~0x004FF) next to it. , And at the position (0x0E05B) after about 57 KB is loaded with a number of interrupt service routines of about 8 KB corresponding to the interrupt vector table. There are 256 interrupt vectors in the interrupt vector table. Each interrupt vector occupies 4 bytes. Two bytes are the value of CS and two bytes are the value of IP. Each interrupt vector points to a specific interrupt service routine.

  The BIOS program will select a boot device and transfer control to the code in the boot sector. The main task is to use interrupt vector and interrupt service routine to complete the loading of BootLoader, and finally load boot.img to the position of 0X7C00 to start. The Linux kernel defines how to implement the boot program through the Boot Protocol . There are specific implementation methods such as GRUB 2 and syslinux. Only GRUB2 is introduced here.

BootLoader's work

  boot.img is compiled by boot.S, 512 bytes, installed in the first sector of the boot disk, that is, MBR . Due to limited space, the code is very simple, just playing a guiding role, pointing to the subsequent core image file, namely core.img. core.img includes many important parts, such as lzma_decompress.img, diskboot.img , kernel.img, etc. The structure is as shown in the figure below.

img

The entire loading process is as follows:

  1. boot.img loads the first sector of core.img, namely diskboot.img, the corresponding code is diskboot.S
  2. Diskboot.img loads other modules of core.img, first the decompression program lzma_decompress.img, then kernel.img, and finally the image corresponding to each module module. It should be noted here that it is not the kernel of Linux, but the kernel of grub. Note that the code corresponding to lzma_decompress.img is startup_raw.S. Originally, kernel.img was compressed. Now it needs to be decompressed when it is executed.
  3. After loading the core, start the grub_main function.
  4. grub_mainThe function initializes the console, calculates the base address of the module, sets the root device, reads the grub configuration file, and loads the module. Finally, put GRUB in normal mode, in this mode, grub_normal_execute(from grub-core/normal/main.c) will be called to complete the final preparations, and then display a menu listing the available operating systems. When an operating system is selected, grub_menu_execute_entrystarted, it invokes the GRUB bootcommand to boot the selected operating system.

  Before this, all the programs we have encountered are very, very small and can be run in real mode. However, as the things we load become larger and larger, the 1M address space of real mode can’t fit, so in Before the real decompression, lzma_decompress.img made an important decision, which is to call real_to_protand switch to protected mode, so that more things can be loaded in a larger addressing space.

  There is a big gap between the 16-bit real mode at boot and the 32-bit protected mode required by the kernel to execute the main function. Who will fill this gap? This is what head.S does. Kernel boot protocol as described, the boot program must be filled kernel setup header (positioned kernel setup code offset 0x01f1necessary at the field), which are defined in the head.S. During this period, the head program opens A20, opens pe, pg, discards the old, 16-bit interrupt response mechanism, and establishes a new 32-bit IDT... All these tasks are done, and the computer is already in 32-bit protected mode. Now, all the conditions for calling the 32-bit kernel have been prepared, and then it is logical to call the main function. The following operations can be completed with the 32-bit compiled main function, which officially starts the kernel and enters the magnificent Linux kernel operating system.

to sum up

  This article introduces the whole process from pressing the power switch to loading the BootLoader, and will continue to analyze the whole process from real mode to protected mode to start the kernel to create processes 0, 1, and 2. During the introduction of this article, a lot of assembly code and some knowledge that is very important but not part of the basic process are ignored. Those who are interested in understanding can do more in-depth study and research based on the links in the article, the source code at the end of the article and reference materials.

Source information

[1] src/cpu/x86/16bit

[2] arch/x86/boot/

[3] GURB 2

[4] syslinux

Reference

[1] Linux-insides

[2] Deep understanding of Linux kernel source code

[3] The art of Linux kernel design

[4] Geek Time Talks about Linux Operating System

Guess you like

Origin blog.csdn.net/u013354486/article/details/105828458