Assembly language (6)-operand

 

0. Preface

The x86 instruction format is:

[label:] mnemonic [operands][ ;comment ]

The number of operands contained in the instruction can be: 0, 1, 2, or 3. Here, for the sake of clarity, the labels and comments are omitted:

mnemonic

mnemonic [destination]

mnemonic [destination] , [source]

mnemonic [destination] , [source-1] , [source-2]

There are 3 basic types of operands:

  • Immediate number-use numeric text expression

  • Register operands-use the named registers in the CPU

  • Memory operand-reference memory location

The following table describes the standard operand types, which use simple operand symbols (in 32-bit mode), these symbols are from the Intel manual and have been adapted. This tutorial will use these symbols to describe the syntax of each instruction.

Operand Description
reg8 8-bit general-purpose registers: AH, AL, BH, BL, CH, CL, DH, DL
reg16 16-bit general-purpose registers: AX, BX, CX, DX, SI, DI, SP, BP
reg32 32-bit general-purpose registers: EAX, EEX, ECX, EDX, ESI, EDI, ESP, EBP
reg General register
comfortable 16-bit segment registers: CS, DS, SS, ES, FS, GS
imm 8-bit, 16-bit or 32-bit immediate data
imm8 8-bit immediate data, byte type value
imm16 16-bit immediate data, word type value
imm32 32-bit immediate data, double word value
reg/mem8 8-bit operand, which can be 8-bit general register or memory byte
reg/mem16 16-bit immediate data, which can be a 16-bit general register or memory word
reg/mem32 32-bit immediate data, which can be a 32-bit general register or memory double word
mem 8-bit, 16-bit or 32-bit memory operands

The variable name refers to the offset within the data segment. You can write instructions to parse (find) these operands through the addresses of memory operands. Another notation. Some programmers prefer to use the following direct operand expression. Because the parentheses mean parsing operations.

mov al, [var1]

MASM allows this notation, so you can use it in your program if you want. Since most programs (including Microsoft programs) do not use parentheses in printing, this book only uses this parenthetical notation when arithmetic expressions appear:

mov al, [var1 + 5]

1.MOV instruction

The MOV instruction copies the source operand to the destination operand. As a data transfer instruction, it is used in almost all programs. In its basic format, the first operand is the destination operand, and the second operand is the source operand:

MOV destination,source

Among them, the content of the destination operand will change, but the source operand will not change.

In almost all assembly language instructions, the operand on the left is the target operand, and the operand on the right is the source operand. As long as the following principles are followed, the use of operands in the MOV instruction is very flexible.

  • Both operands must be the same size.
  • Two operands cannot be memory operands at the same time.
  • The instruction pointer register (IP, EIP, or RIP) cannot be used as the target operand.

A single MOV instruction cannot be used to directly transfer data from one memory location to another memory location. Conversely, before assigning the value of the source operand to the memory operand, the value must first be transferred to a register. When copying an integer constant to a variable or register, the minimum number of bytes required by the constant must be considered. Although the MOV instruction cannot directly copy the smaller operand to the larger operand, the programmer can find a way to solve this problem. Suppose you want to transfer count (unsigned, 16-bit) to ECX (32-bit), you can first set ECX to 0, and then transfer count to CX.

The MOVZX instruction (all zero extension and transfer) copies the source operand to the destination operand, and expands the destination operand 0 to 16 bits or 32 bits. This instruction is only used for unsigned integers and has three different forms:

MOVZX reg32,reg/mem8
MOVZX reg32,reg/mem16
MOVZX reg16,reg/mem8

In the three forms, the first operand (register) is the destination operand, and the second operand is the source operand. Note that the source operand cannot be a constant.

The MOVSX instruction (sign-extend and transfer) copies the contents of the source operand to the destination operand, and sign-extends the destination operand to 16 bits or 32 bits. This instruction is only used for signed integers and has three different forms:

MOVSX reg32, reg/mem8
MOVSX reg32, reg/mem16
MOVSX reg16, reg/mem8

When the operand is sign-extended, the highest bit of the smaller operand is repeated (copied) on all the extended bits of the destination operand.

1.LAHF instruction

The LAHF (load status flag bit to AH) instruction copies the low byte of the EFLAGS register to AH. The copied flags include: sign flag, zero flag, auxiliary carry flag, parity flag and carry flag. Using this instruction, you can easily store a copy of the flag bit in a variable.

2. SAHF instruction

The SAHF (Save AH content to status flag) instruction copies the AH content to the low byte of the EFLAGS (or RFLAGS) register.

3.XCHG instruction

The XCHG (Exchange Data) instruction exchanges the contents of the two operands. The instruction has three forms:

XCHG reg, reg
XCHG reg, mem
XCHG mem, reg

4. Direct offset operand

The variable name plus an offset forms a direct-offset operand. This allows access to memory locations that are not explicitly marked. In a 16-bit word array, the offset of each array element is 2 bytes more than the previous one. Similarly, if it is a double word array, the offset of the first element plus 4 can point to the second element.

5. INC and DEC instructions

INC (increase) and DEC (decrease) instructions represent the register or memory operand plus 1 and minus 1, respectively. The syntax is as follows:

INC reg/mem
DEC reg/mem

According to the value of the target operand, the overflow flag, sign flag, zero flag, auxiliary carry flag, carry flag, and parity flag will change. The INC and DEC instructions do not affect the carry flag (this is really surprising).

6. Operation instructions

The ADD instruction adds the source and destination operands of the same length. The syntax is as follows:

ADD dest,source

The SUB instruction subtracts the source operand from the destination operand. The command syntax is as follows:

SUB dest, source

The NEG (not) instruction reverses the sign of the operand by converting the operand to its twos complement. The following operands can be used in this instruction:

NEG reg
NEG mem

Tip: Invert the target operand by bit and add 1 to get the twos complement of this number.

Flag bit: Carry flag bit, zero flag bit, sign flag bit, overflow flag bit, auxiliary carry flag bit and parity flag bit change according to the value stored in the target operand. Checking the results of arithmetic operations uses the values ​​of the CPU status flags. At the same time, these values ​​can also trigger conditional branch instructions, which are basic program logic tools. The following is a brief overview of the status flags:

  • The carry flag means unsigned integer overflow. For example, if the destination operand of the instruction is 8 bits, and the result of the instruction is greater than 1111 1111 in binary, then the carry flag is set to 1.
  • The overflow flag means that the signed integer overflows. For example, if the destination operand of the instruction is 16 bits, but the negative result is less than -32 768 in decimal, then the overflow flag is set to 1.
  • The zero flag means that the result of the operation is 0. For example, if two operands with equal values ​​are subtracted, the zero flag is set to 1.
  • The sign flag means that the result of the operation is negative. If the most significant bit (MSE) of the destination operand is set to 1, the symbol flag is set to 1.
  • The parity flag means that after an arithmetic or Boolean operation instruction is executed, it is immediately judged whether the number of 1s in the least significant byte of the destination operand is an even number.
  • The auxiliary carry flag is set to 1, which means that there is a carry in bit 3 of the least significant byte of the destination operand.

7. OFFSET operator

The OFFSET operator returns the offset of the data label. This offset is calculated in bytes, and represents the distance between the data label and the start address of the data segment.

8.ALIGN pseudo-instruction

The ALIGN directive aligns a variable to a byte boundary, word boundary, double word boundary, or paragraph boundary. The syntax is as follows:

ALIGN bound

9. PTR operator

The PTR operator can be used to rewrite the size type of an operand that has been declared. As long as you try to access the operand with a size attribute different from that set by the assembler, then this operator is necessary.

10. TYPE operator

The TYPE operator returns the size of a single element of the variable, which is calculated in bytes.

11.LENGTHOF operator

The LENGTHOF operator counts the number of elements in an array. The number of elements is defined by the values ​​appearing in the same line of the array label. If there are nested DUP operators in the array definition, LENGTHOF returns the product of two numbers. If the array definition occupies more than one program line, then LENGTHOF only refers to the data defined in the first line.

12.LABEL pseudo-instruction

The LABEL directive can insert a label and define its size attribute, but it does not allocate storage space for this label. All standard size attributes can be used in LABEL, such as BYTE, WORD, DWORD, QWORD or TBYTE. A common usage of LABEL is to provide different name and size attributes for the next variable defined in the data segment.

13. Indirect addressing

Direct addressing is rarely used for array processing, because it is not practical when using constant offsets to address multiple array elements. On the contrary, the register will be used as a pointer (called indirect addressing) and control the value of the register. If an operand uses indirect addressing, it is called an indirect operand. Any 32-bit general-purpose register (EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP) with parentheses can form an indirect operand. The address of the data is stored in the register. If the destination operand is also an indirect operand, the new value will be stored in the memory location addressed by the register. The indexed operand refers to adding a constant to the register to generate an effective address.

If a variable contains the address of another variable, the variable is called a pointer. The pointer is an important tool for controlling arrays and data structures , because the address it contains can be modified at runtime. For example, you can use a system call to allocate (reserve) a memory block, and then save the address of this block in a variable.

14. JMP and LOOP (transfer) instructions

Assembly language programs use conditional instructions to implement high-level statements and loops such as IF statements. Each conditional instruction contains a possible branch (jump) to a different memory address. Control transfer, or branching, is a method of changing the order of execution of statements. There are two basic types:

  • Unconditional transfer: transfer to a new address no matter what the situation. The new address is loaded into the instruction pointer register so that the program is executed at the new address. The JMP instruction implements this transfer.
  • Conditional transition: When a certain condition is met, the program branches. Various conditional transfer instructions can also be combined to form a conditional logic structure. The CPU interprets true/false conditions based on the contents of ECX and flag registers.

The JMP instruction unconditionally jumps to the target address, which is identified by a code label and converted into an offset by the assembler. The syntax is as follows:

JMP destination

When the CPU performs an unconditional branch, the offset of the target address is sent to the instruction pointer register, which causes the step to continue execution from the new address.

The LOOP instruction is officially called looping according to the ECX counter, repeating the program block a specific number of times. ECX becomes a counter automatically, and the count value is reduced by 1 every time it loops. The syntax is as follows:

LOOP destination

The loop target must be within the range of -128 to +127 bytes from the current address counter. The execution of the LOOP instruction has two steps:

  • In the first step, ECX is reduced by 1.
  • The second step is to compare ECX with 0.

If ECX is not equal to 0, then jump to the label given by the target. Otherwise, if ECX is equal to 0, no jump occurs and control is passed to the instruction following the loop. In the real address mode, CX is the default loop counter for the LOOP instruction. At the same time, the LOOPD instruction uses ECX as the loop counter, and the LOOPW instruction uses CX as the loop counter.

Guess you like

Origin blog.csdn.net/qq_35789421/article/details/113736373