Detailed 80x86 basic knowledge

80x86 Basics

X86 register summary

First of all, let's get to know the relevant knowledge of the x86 instruction set under the INTEL system.

x86 is a 32-bit instruction set developed by Intel Corporation. It has been used since 80386. It is a complex instruction set architecture. Intel officially calls this instruction set "IA-32".

X86_64 is a 64-bit instruction set, so the main difference between x86_64 and x86 we often say is the problem of 32-bit and 64-bit, and the instruction lengths of the two instruction set systems are different. The instruction set is the language of the CPU. The 32-bit instruction set means that the CPU can process 32-bit data at a time, and the 64-bit instruction set means that the CPU can process 64-bit data at a time, which is more efficient.

The 32-bit and 64-bit also represent what we often call a 32-bit operating system or a 64-bit operating system.

The X in x86 represents a range, which generally refers to many instruction set models. Later, for unification, x86 refers to the 32-bit instruction set.

The instruction sets under the x86 family mainly include

  • 8086, 8088 - 16-bit registers
  • 80186 and 80286 are two transitional products
  • The 80386, 80486, and later models are all 32-bit registers.

x86 registers

Friends who are familiar with me know that I have written several articles on assembly language, and they are still being updated. The assembly language I wrote is based on the premise of 8086 assembly. The registers in 8086 assembly are 16 bits, while x86 The number of registers is 32 bits, mainly divided into:

  • 8 general-purpose registers, namely EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP
  • 1 flags register: EFLAGS
  • 6 segment registers: CS, DS, ES, FS, GS, SS
  • 5 Control Registers: CR0 - CR4
  • 8 debug registers: DR0 - DR7
  • 4 system address registers: GDTR, IDTR, LDTR, TR
  • Other registers: EIP, TSC, etc.

general purpose register

We knew it when we were learning assembly (but it won’t hinder you if you haven’t learned assembly, I’ll try to be as straightforward as possible), sixteen-bit registers such as AX, BX, CX, and DX can be subdivided, as shown in the figure below Show.

Take AX, AH, and AL as an example, we can temporarily store data in AX, or temporarily store data in AL and AH, and they can be used alone or together.

It should be noted that the internal real situation of the data temporarily stored in AX is still written into AL. When the data exceeds the maximum value that AL can temporarily store, it will be written into AH, and the data cannot be directly temporarily stored in AH. middle. AX = AL + AH, understand this formula well, the other three registers are similar, except that the temporarily stored data and instruction types are different.

Going back to the 32-bit x86 registers, the same is true. The four 32-bit registers EAX, EBX, ECX, and EDX can be subdivided, as shown below:

Except these 4 32-bit registers can be divided into 8-bit registers, other general-purpose registers cannot be further divided into 8-bit registers (ESI, EDI, ESP, EBP).

Let's introduce the main functions of these registers:

Can be subdivided into 8-bit general purpose registers - 4.

  • EAX: Accumulator (Accumulator) , its lower 16 bits are AX, and AX can be subdivided into AH and AL. EAX is the default register for many multiplication and division. For example, when we learn 8086, the dividend will be placed in AX or AX and DX by default. EAX is the same. It can also store the return value of the function, the offset address of the storage segment, etc.
  • EBX: Base Register , its lower 16 bits are BX, and BX can be subdivided into BH and BL. It is mainly used to store the base address when addressing the memory unit.
  • ECX: Count Register , its lower 16 bits are CX, and CX can be further divided into CH and CL. ECX is mainly used to control the number of loops in the loop instruction; among them, CL is used to indicate the shift number in the shift instruction.
  • EDX: Data Register , its lower 16 bits are DX, DX can be further divided into DH and DL. EDX is used to store the high-order operation results of multiplication and division, and can also be used to store IO port addresses.

General purpose registers that cannot be subdivided into 8 bits - 4.

  • ESI/EDI: These two are a pair of relationships, called **Source Index Register** and Destination Index Register , they can only be divided into two 16-bit registers SI and DI . These two are used to store the addressing offset. They are used together with EBX for more flexible memory addressing.

    The most common use of ESI and EDI is string retrieval. In string manipulation instructions, DS:ESI points to the source string, and ES:EDI points to the target string.

  • EBP/BSP: Base Pointer Register and Stack Pointer Register respectively , the lower 16 bits are BP and SP respectively, which cannot be further subdivided. As a general-purpose register, BP can also store the operands and results of arithmetic and logic operations. As a pointer register, BP can directly access memory data. SP is the stack pointer register, it points to the top of the stack together with SS, and SP is only used to access the top of the stack.

flag register

EFLAGS belongs to the status register, also known as the flag register. Status/flag registers are critical to program execution and results. EFLAGS are shown in the figure below.

The system flags and IOPL fields in EFLAGS are used to control IO access, maskable hardware interrupts, debugging, task switching, and virtual-8086 mode. Others are some generic flags, here CF - Carry flag, PF - Recovery flag, AF - Auxiliary carry flag, ZF - Zero flag, SF - Negative sign flag, DF - Direction flag, OF - Overflow flag. Bits 1, 3, 5, 15, 18 - 31 are reserved.

By using instructions such as LAHF/SAHF/PUSHF/POPF/POPFD, the flag bits in the EFLAGS register can be moved to the stack or the EAX register in groups, or the results can be saved to the EFLAGS register from memory and other locations.

Below we describe these flags.

  1. Status flag (Status Flag)

Bits 0, 2, 4, 6, 7, and 11 in the EFLAGS register are the result flags of arithmetic instructions, including ADD, SUB, MUL, and DIV. The functions of these status flags are as follows:

  • CF (bit 0): Carry Flag is the flag of the operation result of the unsigned operand. If the result of the arithmetic operation produces a carry or borrow in the most significant bit, it is set to 1. For example, for an unsigned number whose number of digits is N, its corresponding binary most significant bit is N - 1 bit, and its Nth bit is its imaginary highest bit. If the most significant bit is carried to a more significant bit, it means a carry in operation, CF = 1, and the same is true for a borrow.
  • PF (2 bits): Parity flag parity flag bit, if the operation result is converted into binary and contains an even number of 1s, then the PF bit is 1, otherwise it is 0.
  • AF (4 bits): Adjust flag Auxiliary carry, if the arithmetic operation has a carry or borrow in the third bit of the result, the flag is set to 1, otherwise it is cleared to 0. This flag is used in BCD (binary-code decimal) arithmetic operations.
  • ZF (6 bits): Zero flag is the zero flag bit, to judge whether the operation result is 0, if it is 0, ZF = 1.
  • SF (7 bits): Sign flag Negative sign bit, judge whether the binary most significant bit of the operation result is 1 - it means a negative number, if it is a negative number, SF = 1.
  • OF (11 bits): Overflow flag The overflow flag, the flag of the operation result of the signed number.

Simply put:

Arithmetic instructions produce three types of results: unsigned integers, signed integers, and BCD. BCD is an integer represented in binary. If the result is regarded as a signed number, it will affect the OF flag, resulting in a carry or borrow; if the result is regarded as an unsigned number, it will affect the CF flag, resulting in a carry or borrow; if the result is BCD, it will affect AF, SF, PF, ZF logo.

  1. Control flag (DF-Direction flag)

DF controls the string transfer command, it is a direction flag. In serial transmission such as MOVS instruction, control the increment or decrement of SI DI after each operation. df = 0 means that SI and DI increase after each operation; df = 1 means that SI and DI decrease after each operation. The cld instruction and the std instruction control the increment and decrement. The cld instruction sets df to 0 and increases; the std instruction sets df to 1 and decreases. df = 0 can be understood as forward pass; df = 1 is reverse pass.

  1. System Flags and IOPL Field (System Flags and IOPL Field)

These flags in the EFLAGS register are used to control the operating system or perform operations, and they are not allowed to be modified by the application program. These flags work as follows:

  • TF (8 bits): Trap flag trace flag, setting this bit to 1 means allowing single-step debugging operation, clearing it to 0 means not allowing single-step debugging. During single-step execution, a debug exception will be generated after each instruction is executed, so that we can observe the status of each instruction after execution.

  • IF (9 bits): Interrupt enable flag This flag is used to control the processor's response to maskable interrupt requests. IF = 1 means to respond to maskable interrupts; IF = 0 means to disable maskable interrupts.

  • IOPL (bits 12-13): I/O privilege level field indicates the currently running IO privilege level (IO Level), and the CPL of the currently running program or task must be less than or equal to IOPL to allow access. Only when the CPL privilege level is 0, the program can use the POPF and IRET instructions to modify this field.

  • NT (14 bits): Nested Task nested task flag, which controls the link relationship between the interrupted task and the calling task. The processor sets this flag when executing a task call using a CALL instruction, an interrupt, or an exception. When returning from a task via IRET, the processor checks this NT flag, which can also be modified using POPF POPFD.

  • RT (16 bits): Resume flag The resume flag that controls the processor's response to debug breakpoints. When set, this flag temporarily disables debug exceptions generated by the breakpoint instruction; when resumed, the breakpoint instruction will generate an exception.

  • VM (17 bits): Virtual-8086 Mode flag, when this bit is set, the virtual 8086 mode is turned on; when this flag is reset, it returns to the protected mode.

segment register

There are mainly 6 segment registers, CS, DS, ES, FS, GS, and SS. These segment registers are also called. In sregcontrast, reg is used to represent registers.

The significance of the existence of the segment register is to allow the memory to be better segmented, and the access to the memory address is realized through the segment register + offset within the segment. These CS, DS, ES, FS, GS, and SS do this. You can also understand the segment register + intra-segment offset as a combination of community + room number . The community is the base address of the segment, and your house number is Offset within the segment. In this way, you can uniquely locate your location (memory physical address) through the community and room number, and this location is 16 bits. Let’s talk about which district these six brothers belong to:

CS: Generally used to describe the code segment, the code segment saves the instructions being executed. When the processor reads instructions from the code segment, it uses the logical address formed by the combination of the segment selector in the CS register and the EIP register. Instruction fetching and execution is a very important concept in computers. The meaning it shows is to obtain the instruction at the CS:IP address (EIP, 32 bits in the x86 system) to execute.

DS: Data segment register, generally used to save the data structure of the program.

ES/FS/GS: These three segment registers can be understood together, they refer to additional data segments, in fact, they can also be understood together with DS. For example, you can create four data sections as follows: the first data section holds the data structure of the current program module, the second data section holds the data exported by the higher-level program module, and the third data section holds the dynamically created data structure , the last data segment saves the data shared by another program.

SS: Generally used to describe the selector of the stack segment. The stack segment here is used to store the program and the stack frame of the currently executing processor program. All operations on the stack can be located by SS:SP, and SS:SP must be is pointing to the top of the stack.

control register

There are four control registers, which are relatively simple and rough, divided into CR0, CR1, CR2, and CR3. The control registers are used to determine the operating mode of the processor and the characteristics of the current execution task. These registers are 32 bits, and their respective functions As shown below:

These registers are closely related to the paging mechanism, therefore, these registers are involved in process management and virtual memory management. The reading and writing of the control register is realized through the mov instruction.

  • CR0: Contains the system control flag that controls the operating mode and status of the processor. It is divided into two types, one is the coprocessor control bit, and the other is the protection control bit. Let me talk about the coprocessor control bits first:

    Among them, the extended type bit ET, the task switching bit TS, the emulation bit EM and the math existence bit MP are used to control the floating point of x86, that is, the operation of the math coprocessor.

What is the concept of a coprocessor? It is a chip that is used to offload specific processing tasks from the system microprocessor.

The ET bit (flag) of CR0 is used to select the protocol used to communicate with the coprocessor, that is, to indicate whether the 80387 or 80287 coprocessor is used in the system. The TS, MP and EM bits are used to determine whether a floating-point instruction or a WAIT instruction should generate a Device Not Available (DNA) exception. This exception can be used to save and restore floating-point registers only for tasks that use floating-point arithmetic. Doing so can speed up switching between tasks that do not use floating-point arithmetic.

Protection control bits:

(1) PE: Bit 0 of CR0 is the protection enable flag. When this bit is set, the protection mode is turned on; when it is reset, it enters the real address mode. This flag only enables segment-level protection, and does not enable paging. To enable the paging mechanism, both the PE and PG flags must be set.
(2) PG: The 31st bit of CR0 is the Paging flag. When this bit is set, the paging mechanism is turned on; when it is reset, the paging mechanism is disabled, and all linear addresses are equivalent to physical addresses. The PE flag must be turned on or at the same time before turning on this flag. That is, to enable the paging mechanism, the PE and PG flags must be set.

(3) WP: For intel 80486 or above CPU, the 16th bit of CR0 is the Write Protect flag. When this flag is set, superuser writes to user-level read-only pages are prohibited; the reverse is true at reset. This flag helps UNIX-like systems implement copy-on-write technology.

(4) For intel 80486 or above CPUs, bit 5 of CR0 is the coprocessor error flag (Numeric Error).

  • CR1 is reserved
  • CR2 and CR3 are used for the paging mechanism, CR2 is the page fault linear address register, which holds the full 32-bit linear address of the last page fault. When a page exception is reported, the processor stores the linear address that caused the exception in CR2. So a page exception handler in the operating system can examine the contents of CR2 to determine which page in the linear address space caused the exception.
  • CR3 contains the physical address of the directory table page, so this register is also called the page directory base address register PDBR (Page-Directory Base address Register)

memory management register

There are four memory managers, namely GDTR, LDTR, IDTR and TR.

GDTR, LDTR, IDTR, and TR are all segment base address registers, and these registers contain important information about the segmentation mechanism. GDTR, LDTR, and IDTR are used to address the segment that holds the descriptor table. TR is used to address a special task status segment TSS, which contains important information about the currently executing task.

Because after entering the protected mode, the memory segment, data segment and stack segment cannot be directly addressed, and the information of these segments is stored in the GDT global descriptor table, which is a table that records information of each segment.

  • Global Descriptor Register GDTR

The GDTR register is used to store the 16-bit table length and 32-bit linear address in the GDT global descriptor table. The LGDT and SGDT instructions are used to load and save the contents of the GDTR register, respectively. After the machine is powered on or the processor is reset, the base address is set to 0 by default, and the length is set to 0xFFFF.

  • Interrupt Descriptor Register IDTR

Similar to GDTR, except that it is a register used to save the mid-segment descriptor table IDT, using LIDT and SIDT to load and save the contents of the IDTR register. After the machine is powered on or the processor is reset, the base address is set to 0 by default, and the length is set to 0xFFFF.

  • Local Descriptor Table LDTR

LDTR is used to store the 32-bit linear base address, 16-bit segment limit and descriptor attribute value of the local descriptor table LDT. The instructions LLDT and SLDT are used to load and save parts of the LDTR register descriptor respectively. A segment containing an LDT table must have a segment descriptor entry in the GDT table.

  • task register TR

The TR register is used to store the 16-bit segment selector, 32-bit base address, 16-bit segment length and description attribute value of the TSS segment of the current task. It refers to a descriptor of type TSS in the GDT table. The instructions LTR and STR are used to load and save the segment selector portion of the TR register, respectively.

other registers

EIP: The offset address of the instruction. Its nature cannot be directly accessed by instructions. This register instruction is controlled by control transfer instructions, interrupts and exceptions. The read operation is realized by executing the call instruction and obtaining the address stored in the stack, while the write operation is completed by modifying the return instruction pointer in the program stack and executing the RET/IRET instruction, so although this register is quite important, it is not The focus of the operating system's attention during implementation.

TSC: (time stamp register) Its value is increased by 1 every clock cycle, and it is cleared when it is restarted.

Floating-point register: Since there is a floating-point operator inside the 80486 microprocessor, there are corresponding registers inside it, including 8 80-bit general-purpose data registers, 1 48-bit instruction pointer register, and 1 48-bit data register. Pointer register, a 16-bit control word register, a 16-bit status word register and a 16-bit flag word register.

system command

System instructions are used to process system-level commands, such as loading system registers and interrupts. Most instructions can only be executed by operating system software at privilege level 0.

It needs to be explained here that the operation of the operating system has hierarchical permissions, and applications cannot access or operate the kernel anytime and anywhere. There are 4 levels of privilege, which are 0 to 3.

The operating system is at level 0 privilege, which can directly control the hardware and control various core data; system programs are at level 1 privilege or level 2 privilege, mainly system services such as some virtual machines and drivers; and general applications run on 3 level privileges.

Some operating system instructions are listed below and are also mentioned if they are protected:

image-20230330083603188

Guess you like

Origin blog.csdn.net/zy_dreamer/article/details/132589884