"Operating System Restore truth" study notes Chapter 0

0x1 hardware access software method

Access between hardware and software is dependent on a variety of hardware devices, namely IO interface. Interface is the production of standard hardware, all hardware must be in accordance with this standard in order for software and hardware interoperability.
Hardware is divided into two ways parallel and serial input and output, a corresponding interface is serial and parallel interfaces. Hardware serial communication with the CPU through the serial interface, the serial interface with the CPU serial data transmission through the device. Parallel Similarly, the only difference between different interfaces.
Access to external hardware in two ways:
1, the memory-mapped peripherals to a range of address space, will be automatically transferred to the CPU peripheral memory address bus trip to the region, this mapping so that the CPU accesses the peripheral memory is accessed through a physical memory motherboards. Examples of such peripheral access: card, a display adapter card, the CPU does not directly interact with, only communicate with the graphics card. There the memory card called a memory chip, it is mapped to the lower end of the 1MB host physical memory 0xB8000 ~ 0xBFFFF. This CPU access memory chip memory is accessed. To write on this piece of memory sub is on the screen to print.
2, the peripheral is a communication interface through the IO CPU, CPU access to peripheral devices, the IO interface is accessed by IO interface to pass information to the other end of the peripheral. CPU just send and receive data to the designated peripheral, the IO interfaces will be processed according to the data, the data transfer processing to the CPU peripheral data may identify program responds. IO interface is essentially register. Register is essentially a set of eight circuits. (PS: Read the "0x86 from real mode to protected mode" can be seen)

0x2 applications and the operating system is how to tie together

Application is a need to use programming language, and the language and compiler dependent for their catch into machine code. So there is no language, there are only compiler. The compiler decides how to explain certain keywords and syntax of programming languages. Language is only the compiler and everyone agreed, so long as the write code, compile it kept it translated into some kind of machine instructions, depending on what kind of translated into compiler behavior, and language-independent. For example: printf function, if other non c / c ++ compiler to compile it, can put the contents of this critical sub-compiled into rings, wrap machine code, etc., can not be compiled into the specified character displayed on the screen.

0x2.1 programming language runtime

The compiler provides a set of library functions, there may be encapsulated system function calls, so called collection runtime code. C language runtime called the C runtime library, known as CRT. (C language early so that the operating system is not correct, because the C language is also dependent on the CRT under the system calls .Linux Linux operating system has a system call, there is a corresponding Windows system calls under Windows.)

What is the application 0x2.2

Applications Applications add functionality provided by the operating system is considered complete program. Thanks to the support of the operating system, the operating system to run properly, we usually write the program are "semi", you need to call the operating system provides a good function to complete to get things done, and this function is a system call.

What is 0x2.3 user mode and kernel mode

User mode and kernel mode is a CPU mechanism has nothing to do with the operating system. It means the CPU is running user mode (privilege level 3) or kernel mode (privilege level 0), with the operating system and applications no relationship.
User mode process into kernel mode means: internal or external interrupt occurs, the current process is to suspend the implementation of its kernel interrupt context is saved after the program starts executing some kernel code. Kernel code, the user is not in the kernel of the program code, the code of the user program is not possible in the kernel.
When the application into the kernel, he has been under the CPU, and what later happened, the application does not know its context it has been saved to their 0 privilege level stack, and then the program running on the CPU It is already a kernel. Never enter a user process because kernel mode and transform into the operating system.

Why 0x3 called "into a" kernel

Applications in privilege level 3, in the operating system kernel privilege level 0. When the user program and access to system resources (whether hardware or kernel data structures), it requires a system call. Such CPU will enter the kernel state, also known as pipe state.

Why 0x4 access to memory segments

First, the CPU using segmented memory access mechanism, it concerns only the CPU section. And segmentation mechanism is a historical legacy. From the beginning of 8086 CPU, limited to technical and economic, and the CPU registers are 16 bits. At that time only the physical address of the computer, there is no virtual address, compiled by the compiler are absolute physical address, but if the same load address of the compiled program or two overlap of the two programs can have only one run. To solve this problem, it was proposed to allow the CPU using: base address + address offset manner in access any memory segment, so that even if the program is repeated physical address offset may be modified in accordance with the base address relocation. So that you can run multiple proceedings.
Random access memory is a device that is as long as it shows a given memory address directly to the address position, you do not need to start from zero. Such as access memory 0xC00, as long as this can be written to address bus.

0x5 code sections

Software program segment is generally assigned by the compiler, but also some programmers themselves are divided. Needs to be allocated a plurality of segments in a multi-segment model, and then constantly switching segment register points to the segment to segment register points to access the data in different segments. (In fact, 16-bit multi-stage model design compilation, as only limited access to the segment base address)
in a flat mode, 4GB segment registers can point to the space, there is no need for multi-stage switching model segment register to access the data content. So if the code segment for the operating system depending on whether the flat mode.
CPU is a highly integrated chip, as long as the CPU gives the starting address of the first instruction, CPU will automatically obtain its next instruction while executing the instruction. Further instructions are required to be read no void, the address of the next instruction in accordance with the size of the previous instruction down the row, which is the program counter cs Intel processors: eip can be obtained automatically through the principle of the next instruction.
In order for the program instructions are executed one after another, all instructions should rows together to form a continuous command area, which is a code segment. Instruction is the opcode and operands, the operand is a data program. The data are continuously stored together form the exhaust passages, called data terminal.
Terminal attribute data and the code segment is the result of a CPU, operating system and compiler interaction of:
1) pick a different compiler attribute data includes at compile time. The properties of the thus classified program fragment.
2) operating system provided by the GDT (global descriptor table) to build the segment descriptor, the position of the segment descriptor specified segment species, size, and attributes (including TYPE segment S and segment), this is the real attribute to add segments local.
3) CPU in the segment register, the operating system is given in advance a segment selector corresponding to the determined segment pointed. In the execution of the instruction, the instruction will be judged based on the behavior of the segment attributes, if any abnormal return is issued.
Memory segmentation is a mechanism to access the memory of a CPU, a program segment that is logically divided kind of software memory area, which itself is a memory, the processor accessing this area, also use memory segmentation mechanism, with segment registers point the start address of the region.

0x6 difference between the physical address, logical address, effective address, a linear address of the virtual address

In real mode, "the segment offset address segment base address +" treated section member, a physical address is directly outputted, the CPU can use this directly to address memory.
In protected mode, "the segment base address + offset address segment" called linear address, then the segment base address is no longer true, but the data structure is called a segment selectors. Which itself is an index, similar to the array index, will be able to find the corresponding segment descriptor in the GDT species by this index, species found in the descriptor records the initial section, size and other information, thus obtained segment Base address. If the paging feature is not turned on, this linear address directly be used as physical addresses. If enabled, the linear address is a virtual address (virtual address, linear address is a meaning in the paging mechanism). Virtual address to be translated into specific physical address of the page after the CPU part, this CPU before it can be sent to the address bus to access memory.
Whether in real mode or protected mode, the segment address offset address is also called an effective address, also called logical addresses.

Why 0x7 applications under the Linux system can not run in Windows system

Different operating systems, different file formats, API systems lead to different applications under the two systems are not interoperable.

Why 0x8 local variables and function parameters to be placed on the stack

Unlike static local variables as part of the global nature. Global variables can be accessed at any time, and local variables can be accessed only under certain conditions, ready to clean up, so to local variables on the stack. (Stack is the stack and heap are not a dime is a habit)
in the memory layout of a C program, because the heap and stack address space is bordered by low stack to address development from a high address, low pile to higher addresses development, heap and stack sooner or later will meet, they each depend on the size of the actual usage, the boundaries are not clear. This is the reason might be called the stack.
Function on the stack of reasons:
1) Locality: parameter function uses local variables
2) uncertainty: the compiler can not predict the number of function calls, and the function's return value and parameter requires memory to store. Memory space is uncertain.

0x9 Why assembly language is faster than the C language

This argument is wrong. Because no matter what language you want to let it run must go through the CPU. CPU does not know what assembly language, C language, and even Java, PHP, Python, etc., it does not know its instruction to have experienced so much interpretation, compilation process. No matter what language compiler, the compiler is ultimately translated machine instructions and the C compiler is no different machine instructions.
The reason why the assembly language language faster than C, because the assembler machine language instructions generated less so "appear" faster. Write programs in assembly language equivalent to write directly to machine instructions, assembly language does not add additional statements, CPU will not execute unrelated instructions and more waste of time, of course, will be faster.
The C compiler in order to allow programmers to more easily programmed, it is behind a lot of work, not only that, versatility, ease of use for other considerations, the C compiler will often add extra code behind, then translated by the assembler code into machine instructions to compile, the generated machine instructions generated redundancy.

0xA difference compiled and interpreted program program

Interpreted language, has become the scripting languages, such as JavaScript, Python, Perl, PHP, Shell scripts. They themselves are text files that are input to an application, the application is a script interpreter.
Since only the text, these scripts are no different from the code string in the script interpreter opinion. In other words, the script code executed from not being true to the CPU, CPU's cs: ip register never point to them. In the CPU's eyes only see the script interpreter. On the implementation of the essence of the script is the script interpreter constantly analyze this script, to make the appropriate dynamic behavior based on keywords and syntax. So the script if an error occurs, the previously correct part will perform properly, and compiled this program is very different.

0xB BIOS interrupts, DOS interrupts, Linux interrupt

In a computer system, whether it is in real mode or in protected mode, in any case there will be external or internal events from occurring. If the event from the internal CPU is called abnormal, that is Exception. For example, CPU in the calculation algorithm and found that the denominator is 0, it throws an exception other than 0. If the event from the outside, that is, the event is an external device by the law and notify the CPU, this device is called an interrupt.
BIOS and DOS are present in the program in real mode, interrupt calls established by them are built in the interrupt vector (Interrupt Vector Table, IVT) in. They are int interrupt number to be called by software interrupt instruction.
Interrupt vector table is the size of each of the 4-byte interrupt vector. It describes a 4-byte interrupt handling routine (program)
the interrupt vector table each interrupt vector size is 4 bytes. 4 bytes describes a segment base address and the segment offset processing routine (program) is interrupted. Because the interrupt vector table length is 1024 bytes, so the interrupt vector 256 for up to the handler table. The beginning of the computer starts, the interrupt vector table by the BIOS interrupt routine is established, from which the physical address at 0x0000 initialization and interrupt handling routines to add various vector table.
The main BIOS interrupt function call is to provide a method for accessing hardware. BIOS only to become easy to operate hardware, BIOS can not access the hardware. Hardware to read and write operation simply by the port peripherals in / out instructions, BIOS interrupt processing is contained in / out the various operations of instruction.
Fill reason BIOS interrupt handling routine:
1) for their own use. Because the BIOS is a program that is likely to repeat the program run a piece of code that is written directly to the interrupt function, direct call.
2) for subsequent uses, such as a loader or boot loader. When they call their own hardware resources do not need to rewrite code.
BIOS settings also need to call someone else's routine function interrupt routine. BIOS software is also required in others. First of all hardware manufacturers to make their products easy to use, be sure to achieve a set of written call interface, it would be as simple as possible, to pass a parameter directly interface functions, you will be able to return to a hardware output.
Each peripheral, including the video card, a keyboard, various controllers, and the like, has its own memory (motherboard has its own memory, the BIOS stored on the inside), but this memory is read only ROM. Hardware routines, and their function calls initialization code in this ROM. According to the specification, the content of a memory cell is 0x55, the second storage unit is 0xAA, the third memory unit is in the rom, 512 bytes code length. From fourth memory unit is the actual code, so far know the length shown in the third memory cells.
There are two ways to access peripherals.
1) Memory Mapping: Peripheral own address bus memory mapped to a memory area (not mapped into memory on the system board is inserted in).
2) Port Operation: peripheral device has its own controller, the controller has registers which are so-called ports, to access the hardware memory read through these ports in / out instructions.
DOS runs in real mode, so the establishment of interrupt calls also established in the interrupt vector table, but it does not interrupt vector number and the BIOS conflicts. 0x20 ~ 0x27 is a DOS interrupt.
DOS interrupt calls that are multifunctional through to the ah register first written sub-function number, then execute int 0x21. In the case of an interrupt vector table 0x21 table entries, i.e. the interrupt handler at the start physical address 0x21 * 4 to call the corresponding sub-functions in the value of the register ah.
The Linux kernel is the establishment of an interrupt routine before entering protected mode, but in protected mode, the interrupt descriptor table has been interrupted vector table replace (Interrupt Descriptor Table, IDT).

0xC library functions as a bridge user process and the kernel

In Linux, C programming, we write the programs are usually user-level programs. In order to output text, we usually start in the file include <stdio.h>, so that the program can use this function printf complete printout. This is because the user does not have the function of an independent printing characters, he must before they can use the power of the operating system. The operating system provides a system call interface, user processes directly call these interfaces on the line. So we use the library functions are such an interface, call the library function is equivalent to using the function interface.

What 0xD MBR, EBR, DBR and each is OBR

BIOS is a small program memory on the motherboard, where space is limited, smaller amount of code, functionality is limited, so must take the form of function relay to hand over control. BIOS initialization completed only simple test work, and then look for opportunities to hand over the right to use the processor to the program in the MBR. In order to facilitate OS found MBR, MBR program must be stored in the specified location, the MBR is placed in the first sector of the entire disk, it is also known as the boot sector.
MBR is the master boot record, Master or Main Boot Record, which is located in the beginning of the entire hard drive sector, i.e., sector 0 0 1. Generally the sector size is 512 bytes, but the general case.
MBR boot sector content:
1) 446-byte boot program and parameter
2) 64-byte partition table
3) and 2-byte 0x55 0xaa end flag
stores a boot program in the MBR boot sector, from the BIOS in order to who took over control of the system, that is, the right to use the processor. BIOS data from the boot sector of the boot sector or default data read track 0x7c00, then call cs: ip far jump instruction to jump past the MBR program execution. This is the transfer of control procedure.
In addition to the main MBR boot sector, there are "views" boot. MBR main task is to run with the current situation to pick the best times to guide the program, the system authority to the secondary boot program. MBR is no other help care much effect. In addition to the MBR boot program, there are 64-byte partition table, there is a partition information, each partition entries occupy 16 bytes. Therefore, the MBR partition table can accommodate only four entries, referred to traverse the control is selected in the four entry.
Usually a secondary bootstrap loader is through the operating system, so the MBR boot task is to hand over control of the system loader.
MBR know how to let the operating system, when we partition, install the operating system as if in a partition, use the partition tool partition is set active partition, set active partition is the nature of the correspondence partition in the partition table active partition table entry is marked 0x80.0x80 expressed on this partition boot program, the boot program does not represent 0, the partition is not bootable. This boot loader is typically kernel. This holds the kernel loader sector is also called the operating system boot record OBR, i.e. OS Boot Record, also referred to this sector OBR boot sector.
OBR beginning of the jump instruction to jump to the target address is not fixed, it is determined by the file system is created, the stack and the FAT32 file system, this jump instruction will jump to the present sector offset byte 0x5A operating system boot routine. No matter how much the destination address is, in short, where typically the operating system kernel loader.
OBR DBR is left over. DBR is DOS Boot Record, i.e. DOS boot program of the system, the content is roughly DBR:
1) jump instruction to jump to the boot code that the MBR
2) vendor information, DOS version information
. 3) the BPB BIOS parameter block, i.e. BIOS Block the Parameter
4) operating system boot program
5) 0xaa end tag 0x55 and
only four partitions, there is no expansion of the partition in the DOS era, this four primary partitions and want to partition, so the sector primary partition called the beginning of DBR boot sector. Because other operating systems have this habit, and DOS exit the stage of history, so instead the OBR.
In order to solve the original problem of only the expansion of the number of partitions partition, EBR is to expand the concept to be compatible with MBR partition was proposed mainly compatible MBR partition table. Expanded partition is a logical partition, thus expanding the partitions have a partition table, partition storage sectors to expand the partition table is referred to as an EBR, i.e. Expand Boot Record. Content compatible partition table, so it MBR same structure, but different positions, to expand the sub-sector is located in an EBR beginning of the partition (note the beginning of the sectors in the partition and each of the main logical partition operating system boot sector Do not confuse the logical partitions and sector, are two completely different concepts)
the MBR and EBR partition tool is created, the operating system does not belong to the scope of management, so the operating system can not write to the content (say here not just a limitation of the operating system, in fact, the operating system can read and write access to all memory addresses), each child has only one partition to expand EBR.