32-bit x86 processor architecture program

Basic execution environment 1. IA-32 architecture

1.1 Extended register

  For use in an assembly language program after extended (Extend) Register:
Here Insert Picture Description
  In 32-bit mode, in order to generate a 32-bit physical address, the processor requires the use of 32-bit instruction pointer register. Flag register is also extended to 32-bit, 16-bit and the first original consistency.
Here Insert Picture Description
  The processor 32 still need to access the memory unit of the grade, i.e., only points of a segment, the address is 0x00000000 foundation segment, the segment length (size) is 4GB. In this case, can be regarded as non-segmented, i.e., the flat pattern (Flat Mode) .

  Segment register in the conventional 32-bit mode, 16-bit is not saved segment address, but the segment selector, and adds two additional segment registers FS and GS, each segment register further comprises a non-visible part, called descriptor cache .

1.2 linear address

  Segment is responsible for management by the segment performed by the processor means, the section member segment address and offset address is added to obtain access to the memory address. In general, the address generating section member is a physical address .

  Paging Function physical memory space into logical pages. The page size is fixed, typically 4KB, by using the page, you can simplify memory management, the small memory space assigned to a task.

  When the page is turned on, the address generating section member is no longer a physical address, but rather a linear address (Linear Address) , the linear page address but also by the conversion member, is a physical address.
Here Insert Picture Description
  Linear address is used to describe the concept of the task address space. IA-32 processors on each task has 4GB of virtual memory space, which is a long flat 4GB of space, just like some straight line, so called linear address space . Accordingly, the address generated by the segment member, corresponding to each point on the linear address space, which is a linear address .

2. The structure and characteristics of modern processors

2.1 Pipeline

  In order to improve the efficiency and speed of the processor, the execution of an instruction may be broken down into a number of small steps, and the corresponding cells are assigned to be complete. Each execution unit are independent, parallel. Thus, the various steps executed in time will be piled up, this method of executing instructions is pipelined (Pipe-Line) technology.
Here Insert Picture Description

2.2 Cache

  Register the fastest speed, because of the use of the trigger, which is stored in the feedback circuit for the principle of making, operating speed is nanoseconds (ns) level. Memory (DRAM dynamic memory) chip material is typically a single transistor and capacitor, since the capacitor requires periodic refresh, so that his access speed becomes very slow, usually several tens of nanoseconds. Finally, a hard disk, a mechanical and electrical equipment, usually in milliseconds (MS) .

  In this case, in order to solve the need to wait for low-speed devices such as memory and hard disk, and a processor memory (DRAM) can match the speed between the SRAM cache (Cache) came into being.

  Using the principle of locality program is running, you can put the processor is available and access instructions and data from memory block is about to be transferred cache. So, whenever the processor to access the memory, first retrieve the cache. If the content has to be accessed, then, can be obtained directly from the cache with great speed in the cache, which is called Hit (Hit) ; otherwise, referred to is not in the (Miss) . Without the processor must reload the cache before obtaining the required content, not just directly into memory to fetch the content. Cache is loaded in units of blocks, including the contents adjacent the desired data. This requires extra time to wait for the cache block is loaded from memory, the time lost in the process is referred to without the penalty (MissPenalty) .

2.3 out of order

  To achieve pipelining, the instruction needs to be split into smaller portions may be executed independently, i.e., split into micro-operations (Micro-the Operations) , abbreviated as [mu] OPS .
Some simple instructions only one micro-operation:

add eax,ebx

  Some instructions may be split into two micro-operations, data is read from memory and stored in temporary registers, one for the value of the EAX register and the temporary register are added.

add eax,[mem]

  This can be split into three micro-operation to read data from a memory, performing an addition operation, a third addition result written back to memory. Once the split is an operating instruction, the processor may, when necessary out of order (Out-Of-Order Execution) program:

mov eax,[mem1]
shl eax,5
add eax,[mem2]
mov [mem3],eax

  Here, the instruction mov [mem2], eax can be split into two micro-operations, so while performing a logical shift left instruction, the processor may read ahead mem2 content from memory, typically, if the data is not in the cache , then the processor after acquiring content mem1, it will start getting content mem2 immediately, then at the same time, shl instruction has begun to be implemented. Similarly, out of order execution speed can greatly accelerate such as push, call instructions, and the like.

2.4 Register renaming

mov eax,[mem1]
shl eax,3
mov [mem2],eax
mov eax,[mem3]
add eax,2
mov [mem4],eax

  Code to do the two things, one is the content mem1 were left three units, and the other is the mem3 contents +2, if different register name behind the name of three operations with names, so this operation is also is not affected, the processor uses a different temporary registers to the last three instructions, and therefore can be left and a parallel adder.

mov eax,[mem1]
mov ebx,[mem2]
add ebx,eax
shl eax,3
mov [mem3],eax
mov [mem4],ebx

  Suppose now that mem1 contents of the cache can be achieved immediately, but the content mem2 not in the cache, that is, the arithmetic left shift would be carried out before the add, so we set a new temporary register for the left , then such content eax know before, he would have been preserved to this value until the content of ebx ready, and then do it with additions. If no register rename mechanism, from left shift operation will have to wait to read the contents of the mem2 EBX register, and from the addition operation.

  After all the operations are complete, the contents of the temporary register that represent the final result is written in register EAX EAX register true, the process is called retirement (Retirement) . All general registers, stack pointer, flag, floating point registers, segment registers are likely to be even renamed.

2.5 branch target prediction

  Pipeline not a hundred percent perfect solution. If the branch instruction is encountered, the back of the instruction pipeline has entered is invalid. Therefore, the introduction of branch prediction techniques (Branch Prediction) . It prediction will transfer occurs when the processor executes a branch statement, it will be a small-capacity cache inside the processor, called the branch target buffer (Branch Target Buffer, BTB) recorded in the current instruction address, address of the branch target, and this branch prediction structure. Next time, before the actual piece of branch instruction execution, the processor looks for BTB, to see if a recent transfer record. If you can find a corresponding entry, then the speculative execution, and on a same branch, the branch instruction into the pipeline.

  When the instruction is actually executed, if the prediction is a failure, then empty the pipeline, while refresh the record of BTB. The larger the price.

3.32 instruction mode

3.1 32-bit addressing processor

   If the processor in 16-bit mode, the instruction is not 0x66 prefix, the instruction is considered a traditional 16-bit addressing mode, instruction prefix if there is 0x66, is the 32-bit addressing mode, the 32-bit mode, there is no instruction prefix 0x66 , the instruction is considered a traditional 32-bit addressing mode, instruction prefix if there is 0x66, it is a 16-bit addressing mode. Default instruction register 32-bit width and 32-bit immediate data, if there is memory addressing, the offset is 32 bits.

  32-bit mode, all of the memory addressing can be used general-purpose 32-bit register as the base register, also may be added in addition to a general-purpose 32-bit ESP register as the index register, the index register may also allow multiplied by 2, 4 and 8 as the proportional factor. Finally, also plus an 8-bit or 32-bit offset.Here Insert Picture Description

3.2 operand size prefix instruction

  Each processor can have a prefix instruction, such as repeated prefix (REP / REPE / REPNE), beyond the segment prefix (e.g., ES :), bus prefix block (the LOCK) and the like. Prefix is optional, each prefix length is 1 byte, each instruction can have from 1 to 4 prefix, or no prefix.
Here Insert Picture Description
  To indicate the default operating environment for programs, a compiler directive provides bits, for indicating the subsequent instruction should be compiled into a 16-bit or 32-bit.

Extended general instruction 3.3

  Since the processor has 32-bit registers and 32-bit arithmetic logic unit, and with the data path between the memory chips is at least 32 bits, so, all instructions to registers or memory operand unit are expanded to adaptive arithmetic logic operation of 32 bits. Furthermore, even if these operations are extended in the 16-bit mode (16-bit real mode and protected mode) it is also useful.

  The processor 32 is pressed into the stack allows the number of twin operation. In particular, it now supports the immediate push operation. General-purpose registers can not keyword byte, word, dword modification, the memory cell must be Keywords byte, word, dword modified, processor memory operations pushed into the conduct and behavior of the immediate pressure is the same.

  • If pressed into a byte must be used to modify the byte, however, he was executed, no matter at what time, processors have not really pushed a byte, the operation will extend the sign bit byte to 24 high , use push ESP register, ESP and the first content minus 4.
  • If pressed into a word, you must use the word to modify, expand pushed to a high 16, and the first content ESP minus 4.
  • If a double word to be pushed number immediately, either in 32-bit mode or 16-bit mode, it must be used DWORD, and stack pointer register (SP or ESP) to be minus four.
  • Pressed into the segment register special operation, in 16-bit mode, the contents of the first SP minus 2, then directly into the contents of segment registers; in 32-bit mode, the contents of the first segment register zero extended to 32 bits, the upper 16 bits all zeros. Then, by subtracting the contents of the ESP 4, and then pressed into a 32-bit value after expansion.

Guess you like

Origin www.cnblogs.com/chengmf/p/12561282.html