80x86 Assembly—80x86 Architecture

It is easy to get started with assembly, but difficult
to learn using the 8086 architecture in depth. If you have not learned the principles of computer composition, this chapter may be a bit difficult to understand. Since I have learned it, I will add the knowledge of computer assembly to explain when taking notes. See If you don’t understand, just skip the explanation.

how computers work

Insert image description here
To put it simply, our applications or programming using high-level languages ​​will eventually become computer instructions composed of 01. In order for the CPU to execute the instructions, it must first use the bus system to control or manipulate certain lines to conduct a path. , let the instructions we want to fetch have a way to go to the CPU. This is a rough process of getting data from the memory.
If we want to interact or communicate with external devices, what our computer needs is an interface. The interface can connect to the external device and it can be connected if it is consistent with the interface specification designed by the computer. The instructions we use to assemble are to monitor through a monitoring method. The interface is called a port in assembly, but this port needs to be distinguished from the port in the web.
Wenyan Yixin explains the two types of ports: "Port" refers to the port number in network communication, while in assembly language, "port" refers to the port address in the hardware interface.

Possible doubts:
What is a bus system?
First of all, the concept of bus is introduced in detail in the assembly. In assembly, you only need to know that the assembly instructions to implement the corresponding operations are bus control, such as Lmov ax, 7. It will know that the control will send 7 to the ax register.

Another thing to note is: don’t worry about whether our assembly can or how to process instructions. I will explain it, but you don’t need to read it. I use it here as my own review notes
:
First, our CPU control bus will be based on the beat and The instruction fetch signal is sent out during the phase, and the PC program counter takes out the address of the instruction, then fetches the instruction from the memory, gives it to the IR instruction register, and then sends the instruction to the CPU. The CPU is divided into two designs: hard-wiring or microprogramming. Which one? Finally, the CPU will send out a control signal. This signal can complete all the micro-operations of the corresponding fetched instruction. For example, an instruction may mov ax, 7. At the same time, our PC program counter will also point to the next instruction. So we need to know that this signal can complete this instruction, and the cycle continues...
There is another problem: don't worry, or it's just me. I used to wonder how high-level languages ​​can make the CPU execute such complex things. I can’t even understand some of the code! ?
Of course, this worry is unnecessary. On the other hand, the CPU only executes the data instructions of the 01 combination. Then we may have corresponded to our assembly language before. When the assembly language is converted into machine code (the machine code is the data instruction of the 01 combination) ), there will be something called an assembler to convert it into machine code, OK, end at assembly, continue to push backwards, and continue to the high-level language, C language, then how to convert C language into assembly, this is very simple , that is to use our compiler (I will not go into details about how to convert it in the compiler), then this is all explained clearly. Since C language can be converted into assembly, assembly can be converted into machine code, and finally given to the CPU, so the CPU is basically I don’t know what you are doing, just give him the correct instructions and let him execute them. At the same time, this also explains the reason why our computer ecosystem is so powerful. Our bottom layer may have very few instructions, but if we go up step by step, that is to say, machine code can be programmed into assembly language, and then we can do it in assembly. Program, and then use assembly to compile a C language, then we can use C language to write more awesome programs, and slowly our windows appeared (dog heads save lives).

memory

In 8086 architecture

  • Memories are programmed in bytes.
  • Each byte has an address, so when you want to fetch the next data (with a continuous address), you need to add the address by 1. The unit of +1 here is byte, so the memory we talked about above is in byte terms. Explained to the unit.
  • The address is represented by an unsigned integer (of course it is), and hexadecimal is used in assembly.
    How do you say this? It is best to use hexadecimal. In fact, you can also use binary or decimal when transmitting data, but it is in the debugging process. The displays in the system all use hexadecimal, so by default we all use hexadecimal for display.
  • When we have a word unit of data, or when we want to take out a word unit of data, we use the first byte address in the word as the address of the entire data, instead of two addresses (because the unit of byte is mentioned above )
    Explanation: This is because there are three commonly used units in our assembly, byte is the most basic, word is two bytes, and dword is two words, which is 4 bytes, but their addresses are all the lowest byte address as the address of this data. , for example, I want to take out the data of dword, how to take out so much, we only need to give the first address to take out the entire segment (this depends on the operand unit we want to operate, the number of registers and the number of registers)

logical address to physical address

First of all, I think this is to solve the problem during our compilation process. Some people may always worry about whether something will happen if the machine code data corresponding to the computer is operated in this way?
It might be possible in previous DOS systems, but not now. Everything that can be operated now is done in real mode. Those that are modified for you can definitely be modified, but those that are not allowed to be modified have been protected for a long time. I really want to modify them. It cannot be modified, because some of them are burned into the hardware, unless you buy a new copy and choose to write the program yourself.

  • Logical address: 16-bit segment address: 16-bit offset address.
    This address is a continuous space that each program (process) has. In other words, what we programmers see are continuous addresses, but the real The space is very small or not continuous. Anyone who has studied computing or operating systems knows that the real physical address is discrete. Then the problem arises. The purpose of doing this is to map the logical address to the physical address. The computer does it for us, so we only need the side that is simple to operate, that is, the continuous address side. We have great flexibility and at the same time provide a layer of protection for the computer's security.
    The logical address is in 8086: 16-bit segment address: 16-bit offset address
    • Segment address: In a program, we can continue to segment it. Each segment represents an address space, and the space that a segment can represent is 64kb (explained later).
    • Offset address: The offset address is our offset relative to this segment in this segment.

After talking about physical addresses, you will know why one segment occupies 64kb

  • Physical address: 20 bit real physical address
  • Convert logical address to physical address
    16-bit segment address × 16 decimal + offset address = physical address

Let’s use the knowledge of computers and operating systems to explain why if you don’t understand it, just remember it:
First of all, the logical address has been explained above because of the convenience of programming and computer security. Secondly, the origin of the formula is because of our 8086 architecture. There are 20 bits in the physical address, which means that the address bus of the CPU in the 8086 architecture is the number of bits in the physical address. (Possible questions: What if the address exceeds 20 bits? Can't I get the address in two times? The answer is no, the physical address in 8086 can only be retrieved once). Back to the topic, since our address line is 20 bits, multiplying our segment address by 16 means shifting the binary to the left by 4 bits.In other words, the hexadecimal segment address is shifted one bit to the left, because four binary digits correspond to one hexadecimal digit., In this way, we have programmed 20 bits of the segment address, and then we have obtained the first address of the segment address. We also need to add the offset address to be the address we really want to get. This concludes the explanation. Then you can understand it in the picture below.
Insert image description here
Insert image description here

Accessing the memory twice means using the low address of your word content to find the first byte, but accessing the memory twice does not mean that we need two addresses to access the word.
@@@@@@@@@@@@@@@@@@@@@
The following is an understanding of segment addresses from our teacher’s ppt:
Insert image description here

  • Back to the last question: Why can a segment store 64KB?
    Because we assume that the first address in a segment is 0, then the maximum offset address is all F, so our range is 0000H~FFFFH, and
    the unit in our 8086 is byte,
    so there is a total of 64K * B = 64KB

register

There are two important components in the CPU, the ALU and the controller. The ALU is used for calculations. The controller is used to receive instructions, interpret them and send out signals corresponding to the instruction operations to work. Then there are many registers, and the registers in the 8086 This is less than in modern CPUs.

Data register:

  • What you need to know about registers such as AX, BX, CX, and DX
    is that they can be divided into two halves, half high-order and half low-order registers (this is to be compatible with the earliest batch of previous registers), and it also provides us with great convenience in assembly. Example: AX can be divided into AH, AL, H means high, L means lower, similarly BX can be divided into BH, BL, and so on.

Pointer register:

  • Registers such as SP, BP, SI, and DI
    are generally used for offset addresses, but they can also be used to store data.But but!, SP cannot be used to store data, it is a stack pointer. It is pointed out here that SP can only be the stack pointer address. This is how we write programs. BP can be stored but we generally don’t use it like this. It is usually used to store the stack. The bottom pointer, BP can be understood as base ptr. SI and DI are generally used to identify offset addresses. S source address and D destination address are generally used in this way. (If you have to save data, it is recommended to only use SI and DI)

segment register

  • CS, DS, SS, ES
    CS is the instruction address register, which means it is the segment address that represents the address of the instruction we want to execute. It is used with the IP register. CS: IP combined is a logical address, corresponding to The address is the address of the next instruction we want to execute (why the next one? Because after we execute an instruction, the PC program counter will increment to the address of the next instruction. The specific increment depends on the size of your instruction). DS is the data segment register, which means that data is stored in this segment. SS is the stack segment, which means that this segment is used to store stack data. ES, extend segment, can be used to store temporary data or can As the temp variable in our programming.
    All the above segment registers can be modified through assembly code. I don’t understand why other people always say that segment registers cannot be modified. In fact, for a well-trained assembly programmer, as long as they are not writing viruses, they are very careful about modifying segment registers. the process of.

control register

  • IP, FLAGS
    IP is the offset address of the instruction address. FLAGS is the flag register. The flag register is definitely used for marking. The difference is that it uses each bit 0 or 1 in the register to represent a state. For example: if a bit is 1, it means that the number is negative, then 0 means that the number is positive.

AX, BX, CX, DX, only the data registers in the 8086 architecture can be divided into high-order registers and low-order registers. And the high and low bits do not interfere with each other when operated separately, because they are essentially two 8-bit registers, but for compatibility, we combine the two into a 16-bit ax register.

Data register usage details

  • AX is generally like a variable, which may appear frequently in the code. Calculated results, etc. are generally stored here.
  • CX is generally the number of loops, which may be used with many different pseudo-instructions. For example, the number of loop loops is determined by CX.
  • DX is used to manipulate data. For example, when we calculate the result AX of higher digits, DX is used to store the high-bit data, and the low-bit data is generally placed in AX.

Other knowledge point details

StackStack

In the first-in-last-out
8086, the top
of the stack is the low address, and the bottom of the stack is the high address. Explanation: The emergence of the stack makes our function calls a lot more convenient. With the stack, we have multiple parameters, such as function parameters in C language. Generally, the rightmost parameter is put into the stack first, and then when it is popped out of the stack, the leftmost parameter comes out first. This is also in line with our human thinking. The parameter used is the first one from the left (forced explanation).
Since the top of the stack is a low address, and the bottom of the stack is a high address, so when we write assembly and open up a stack space, remember to point the SP to the high address when using it, because the bottom of our stack or the data we store is from the high address. Save the low address.

flag register

Find out in this article
Insert image description here

interrupt

There is a foreshadowing here. When we learned about int interrupts, one very disgusting thing was that we did not tell you that an int interrupt number corresponds to the [CSIP address] data of our interrupt stored in segment 0. This is disgusting, and Later I will tell you that the address of an interrupt program you can write to overwrite the default interrupt program is also in segment 0 , but some teachers did not mention this, and the textbook did not mention it specifically.
Secondly, there is a space. The IP is 0200-02ff in segment 0. The data in this range will not be overwritten during the running of the program, so load the interrupt program you wrote into this space, and then you can modify the number 0. If the CSIP address stored in the address of the corresponding middle segment number in the segment is changed to the entry address of 0200, it can be transferred to the interrupt program written by yourself for execution.
The offset address of segment 0 starts from 0. Every two bytes represent the entry address of the interrupt number CSIP. For example, the entry program for the int 0 interrupt number is at the 0 and 1 addresses of the segment. We can jump to the corresponding address when we get this address. Interrupt program, as long as the address of the interrupt program is modified, it will not jump to the program corresponding to the computer's default interrupt number. This is a big pitfall.

The summary of this passage is:

  • Starting from segment 0, use two words or four bytes to store CS and IP. After the storage is completed, go to the interrupt program CSIP of the next interrupt number. For example: after saving interrupt number 0, the space 0-3 stores CSIP number 0. , then the CSIP No. 1 is stored in spaces 4-7.
  • The interrupt number address is segment 0. At the same time, the addresses 0200-02ff in segment 0 can be understood as not being overwritten and unused, so we can also put the program we write into this space as an interrupt program. (Otherwise, writing in other spaces is likely to be overwritten during execution, so the interrupt program cannot be executed because it is overwritten)

I still can't help but add: You must remember that the space 0200-02ff will not overwrite the entry address of our interrupt number. It has not been used. For example, our interrupt number 21 has an offset address in decimal. The address space of 21 stores the entry address of the interrupt program, and there are many interrupt numbers such as 7ch in hexadecimal. Always remember that the section 0200-02ff, which is also in section 0, will not be used, and It will not be overwritten because if the interrupt program entry address is overwritten, then there will be a big problem. (Of course 21 can also be written as hexadecimal 0x15)


Summary: This chapter is about understanding how the computer executes assembly instructions, but this is general. You have to learn how to calculate the group specifically. You only need to have a general understanding. Assembly can be learned without this knowledge, but it is a bit painful to learn. We don’t know the principle. Just like when I first learned C language as a beginner, I hoped that I could understand the crappy articles I wrote when reviewing. Bar…

Guess you like

Origin blog.csdn.net/weixin_60521036/article/details/135074932