Compilation of relevant knowledge

compilation

The overall work of the CPU is divided into three categories:

  1. read command
  2. instruction decoding
  3. Execution unit (compute, read and write memory, set registers, jump)

Reading instructions, memory reading and writing all require the CPU to control other hardware, such as: memory, graphics card

register

Registers contained in 32-bit CPU

  • 4 data registers (EAX, EBX, ECX, and EDX)
  • 2 index and pointer registers (ESI and EDI)
  • 2 pointer registers (ESP and EBP)
  • 6 segment registers (ES, CS, SS, DS, FS, and GS)
  • 1 instruction pointer register (EIP)
  • 1 flag register (EFlag)

Memory size: BYTE (1 byte), WORD (2 bytes), DWORD (4 bytes), QWORD (8 bytes)

32-bit general-purpose registers

register serial number Store data range describe
EAX 0 0-0xFFFFFFFF Data Register (Accumulator)
ECX 1 0-0xFFFFFFFF Data Register (Count Register)
EDX 2 0-0xFFFFFFFF DATA REGISTERS (DATA REGISTERS)
EBX 3 0-0xFFFFFFFF Data Register (Base Address Register)
ESP 4 0-0xFFFFFFFF Pointer register (pointer to the top of the stack)
EBP 5 0-0xFFFFFFFF Pointer register (stack bottom pointer)
ESI 6 0-0xFFFFFFFF Index register (used to store the offset of the storage unit within the segment)
EBI 7 0-0xFFFFFFFF Index register (used to store the offset of the storage unit within the segment)
32 bit 16 bits 8 bits
EAX AX (lower 16 bits of EAX) AH、AL
ECX CX (lower 16 bits of ECX) CH、CL
EDX DX (lower 16 bits of EDX) DH、DL
EBX BX (lower 16 bits of EBX) BH、BL
ESP SP (lower 16 bits of ESP) -
EBP BP (lower 16 bits of EBP) -
ESI SI (lower 16 bits of ESI) -
EBI BI (lower 16 bits of EBI) -

segment register

segment register describe
CS code snippet
DS data segment
SS stack segment
ES auxiliary segment register
FS auxiliary segment register
GS auxiliary segment register

Instruction Pointer Register (EIP)

EIP: Save the address of the instruction that the program is currently about to execute

  1. Can not be modified at will
  2. If you want to modify it, use the jcc command

Flag Register (EFLAG)

EFLAG:

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OF DF IF TF SF ZF OF PF CF
overflow direction to interrupt trap symbol zero auxiliary Parity carry

Note that only arithmetic instructions (arithmetic operations, logical operations) can change the following flags, and mov instructions are not counted as arithmetic instructions

  • CF (Carry Flag): carry flag, when the most significant bit of the operation result has a carry (borrow), the carry flag is 1, otherwise it is 0

mov al, 0x80; add al, 0x80; CF will be set to 1 at this time
mov al, 0x80; add ax, 0x80; CF will not be set to 1

  • OF (OverFlow Flag): overflow flag, if the operation result overflows, then OF=1, otherwise=0. Note: Overflow is relative to signed numbers . For example:

Mov al,0x80 ;SUB AL,0X1 will produce overflow, because AL is an 8-bit register. For signed numbers, it is -128-127.
0x80 is -128 and then subtracting 1 will cause overflow
Judgment method : Calculated by unsigned numbers, See if the highest bit has changed, if it has changed, set it to 1, otherwise set it to 0.

  • ZF (Zero Flag): Zero flag, if the operation result is 0, then ZF=1 otherwise ZF=0
  • SF (Sgin Flag): sign flag, the sign bit of the operation result is 1, then SF=1, otherwise SF=0; (the most significant bit of the binary bit of the signed number indicates the sign)

assembly instructions

Assembly instruction set

  • X64 instruction set
  • The X64 program uses the X64 instruction set, and there are three types of 64-bit instruction sets: AMD64, EM64T, and IA-64.
  • AMD64 was the first to be launched, and Intel and HP jointly launched IA-64, but it was useless.
  • Intel later directly copied AMD's instruction set and launched IA-32E, which was renamed EM64T, which is Intel64.
  • These two sets of instruction sets are actually exactly the same, and we collectively call them the X64 instruction set.
  • The difference between X64 and X86:
  • The original registers in X86 are expanded to 64 bits in X64, and the first letter E of the name is changed to R. But we can still use the low registers RAX, EAX, AX, AL, AH
  • EIP becomes RIP
  • EFLAG becomes RFLAG
  • X64 has 8 more 64-bit general-purpose registers: R8-R15. Each register can also be split into 32 bits 16 bits 8 bits For example: R8D R8W R8B
  • X64 program changes:
  • The function calling convention in 32-bit programs has been abolished
  • There is only 1 calling convention similar to fastcall for x64 applications. The first 4 parameters are passed using registers. If there are more than 4 parameters, the extra parameters will be placed on the stack. The order of the stack is from right to left, and the function caller will balance the stack space. The registers where the first 4 parameters are stored are fixed, which are the first parameter RCX, the second parameter RDX, the third parameter R8, and the fourth parameter R9, and other parameters are placed on the stack from right to left.
  • When an x64 application calls a function, the parameters will be placed in the rsp-rbp section, and the local variables will be stored in rbp+xxx. In the 32-bit program, the parameters will be placed in the ebp+xxx section, and the local variables will be placed in the esp-ebp section.

Assembly instructions are composed of opcodes and operands, and the following are the forms of all operands:

operand describe example
r8 8-bit register
r16 16-bit register
r32 32-bit register
reg arbitrary register
themselves segment register
m8 8-bit memory space
m16 16-bit memory space
m32 32-bit memory space
mem any memory space
i8 8-bit immediate
i16 16-bit immediate
i32 32-bit immediate
imm immediate value of any size

Data Transfer Type Instructions

在这里插入图片描述

  • mov

mov a,b:

  1. a cannot take an immediate value
  2. The digits of a and b must be the same
  3. a and b cannot take memory addresses at the same time
  4. Be cautious when using addresses, and do not operate memory spaces that cannot be operated
  • mov eax, 0x123 : immediate value
  • mov eax, ecx : register
  • mov bh, al
  • mov eax, dword ptr ds:[0x12121212]: memory ([] means address, dword means to fetch 4 bytes from the memory)
  • mov ax, word [0x12121212]
  • mov word ptr ds:[0x12121212], ax
  • movzx

movzx a,b

  1. a must be a register, b can be a register or memory. Neither a nor b can be an immediate value
  2. The number of digits in b is less than the number of digits in a
  3. The insufficient part is filled with 0
  • movzx eax, ax
  • movsx

movsx a,b

  1. a must be a register, b can be a register or memory. Neither a nor b can be an immediate value
  2. The number of digits in b is less than the number of digits in a
  3. The insufficient part is filled with the sign bit of b
  • movsx eax, ax
  • lea

lea a,b

  1. a and b cannot be immediate values
  2. a and b have the same number of digits
  3. a and b cannot take memory addresses at the same time
  4. Be cautious when using addresses, and do not operate memory spaces that cannot be operated
  • lea ecx, dword ptr ds:[eax]:将eax中的值赋值给ecx(而不是把eax当作地址),相当于 mov ecx, eax
  • lea cx,word ptr ds:[edx]

算数运算指令

在这里插入图片描述
在这里插入图片描述

  • add

add a,b:

  1. a不能是立即数
  2. a和b的位数要一致。若a的位数多于b的位数,则b的高位填充0;若b的位数多于a的位数,则错误
  3. a和b不能同时取内存地址
  4. 使用地址的时候要慎重,不能操作无法操作的内存空间
  • add ecx, 0x12345678
  • add eax, ecx
  • add dword ptr ds:[0x12121212], eax
  • sub

sub a,b:

  1. a不能是立即数
  2. a和b的位数要一致。若a的位数多于b的位数,则b的高位填充0;若b的位数多于a的位数,则错误
  3. a和b不能同时取内存地址
  4. 使用地址的时候要慎重,不能操作无法操作的内存空间
  • sub ecx, 0x12345678
  • sub eax, ecx
  • sub dword ptr ds:[0x12121212], eax

位运算指令

在这里插入图片描述

  • and

and a,b:

  1. a不能是立即数
  2. a和b的位数要一致。若a的位数多于b的位数,则b的高位填充0;若b的位数多于a的位数,则错误
  3. a和b不能同时取内存地址
  4. 使用地址的时候要慎重,不能操作无法操作的内存空间
  • and ecx, 0x12345678
  • and eax, ecx
  • and dword ptr ds:[0x12121212], eax
  • or

or a,b:

  1. a不能是立即数
  2. a和b的位数要一致。若a的位数多于b的位数,则b的高位填充0;若b的位数多于a的位数,则错误
  3. a和b不能同时取内存地址
  4. 使用地址的时候要慎重,不能操作无法操作的内存空间
  • or ecx, 0x12345678
  • or eax, ecx
  • or dword ptr ds:[0x12121212], eax
  • xor

xor a,b:

  1. a不能是立即数
  2. a和b的位数要一致。若a的位数多于b的位数,则b的高位填充0;若b的位数多于a的位数,则错误
  3. a和b不能同时取内存地址
  4. 使用地址的时候要慎重,不能操作无法操作的内存空间

应用:

  1. 一般用于清零操作:xor eax, eax
  2. 一般用于加密解密:xor eax, 0x51515151; xor eax, 0x51515151; eax两次异或同一个值,eax会恢复为原来的值。
  • xor ecx, 0x12345678
  • xor eax, ecx
  • xor dword ptr ds:[0x12121212], eax
  • not

not a:

  1. a不能是立即数
  2. 使用地址的时候要慎重,不能操作无法操作的内存空间
  • not eax
  • not word ptr ds:[0x12121212]

逻辑运算指令cmp、test

在这里插入图片描述

JCC指令

在这里插入图片描述
在这里插入图片描述

CALL、RETN指令

call指令:

Call指令类似于JMP指令,也是用来修改EIP的值的,CALL指令后面跟一个操作数,操作数可以是寄存器、内存地址、立即数。

  • CALL指令与JMP区别:
    它除了修改EIP寄存器之外,还修改了ESP ,等价于:

JMP 函数起始地址;
push CALL指令的下一条指令;(保存主程序运行位置)

RETN 指令:

类似于 POP EIP 指令。把栈顶的值(之前主程序运行位置)弹出到EIP中。

MOVS、STOS指令

MOVS指令:
在这里插入图片描述

只有以下三种写法:

  • MOVS BYTE PTR ES:[EDI],BYTE PTR ES:[ESI] == 简写:MOVSB
  • MOVS WORD PTR ES:[EDI],WORD PTR ES:[ESI]== 简写:MOVSW
  • MOVS DWORD PTR ES:[EDI],DWORD PTR ES:[ESI]== 简写:MOVSD

STOS指令:
在这里插入图片描述

只有以下三种写法:

  • STOS BYTE PTR ES:[EDI] 简写:STOSB
  • STOS WORD PTR ES:[EDI] 简写:STOSW
  • STOS DWORD PTR ES:[EDI] 简写:STOSD

REP:
重复操作前缀。通常会加到 MOVS、STOS等指令前面,表示重复执行后面的指令,重复的次数由ECX寄存器的值决定。

  • rep movsb
  • rep stosb

寻址方式

操作数 寻址方式 举例
数字 立即数寻址 mov eax,0x12345678
寄存器 寄存器寻址 mov eax,ecx
内存 直接寻址 mov eax,[0x12345678]
[reg] 寄存器间接寻址 mov eax,[ecx]
[reg+imm] 寄存器相对寻址 mov eax,[eax+0xc]
[reg+reg+imm] 相对基址变址寻址 mov eax,[ecx+ebx+4]
[reg+reg*1,2,4,8] 带比例存储器寻址 mov eax,[ecx+ebx*2]
  • lea

lea a,b

  1. a和b不能是立即数
  2. a和b的位数要相同
  3. a和b不能同时取内存地址
  4. 使用地址的时候要慎重,不能操作无法操作的内存空间
  • lea ecx, dword ptr ds:[eax]:将eax中的值赋值给ecx(而不是把eax当作地址),相当于 mov ecx, eax
  • lea cx,word ptr ds:[edx]

堆栈

堆栈的概念和操作

堆栈:指的就是内存空间中的栈内存

栈内存:

  1. 不需要主动申请,由系统自动分配
  2. 不需要主动释放,系统自动释放
  3. 存放局部变量,函数参数

数据结构上的特点:

  1. 后进先出
  2. 从高地址往低地址存储(栈底(EBP寄存器):高地址;栈顶(ESP寄存器):低地址)

栈内存访问方式:

  1. 栈顶 + 偏移
  2. 栈底 - 偏移

栈内存的操作:

压入栈:

第一种方式:
lea ecx, dword ptr ds:[esp-0x4];
mov dword ptr ds:[ecx], 0x12345678;
sub esp, 0x4;

第二种方式:
sub esp, 0x4;
mov dword ptr ds:[esp], 0x12345678;

取出栈内存的值:

第一种方式(栈顶 + 偏移):
mov eax, dword ptr ds:[esp + 0x4]

第二种方式(栈底 - 偏移)
mov eax, dword ptr ds:[ebp - 0x4]

弹出栈:

mov ecx, dword ptr ds:[esp];
add esp, 0x4;

  • push

push a(压入栈):

  1. a最小为16位
  2. 只要执行了push,就会占4个字节的空间
  • push eax;
  • push 0x12;
  • push ax;
  • push al;(这种不行)
  • push word ptr ds:[0x12121212]
  • push dword ptr ds:[0x12121212]
  • pop

pop a(弹出栈):

  1. 相当于取出栈顶元素,存放到a中
  2. a只能是容器(即寄存器或者内存)
  3. a最小为16位
  • pop ax
  • pop eax
  • pop word ptr ds:[0x12121212]
  • pop dword ptr ds:[0x12121212]
  • pushad

pushad

  1. 将所有寄存器的内容按照序号压入栈中,首先压入eax,最后压入ebi(注意压入的esp的值是压栈之前的esp的值)
  2. 无操作数
  3. 一次扩大4*8字节空间

应用:保存程序中寄存器的值

  • popad

popad

  1. 从栈顶将内容按照序号弹出到寄存器中。首先弹出的数值存放在ebi,最后弹出到eax
  2. 无操作数
  3. 一次减少4*8字节空间

应用:保存程序中寄存器的值

函数调用约定

  1. 传参,通过push指令先将参数压入堆栈(c语言:从右向左压入)
  2. call指令函数入口地址(跳转到函数,并将下一条语句地址push到堆栈中)
  3. push ebp(保存栈底指针,用于后面恢复栈底)
  4. mov ebp, esp(提升栈底指针,准备给函数分配栈空间)
  5. sub esp, xxx(提升栈顶指针,开辟专属该函数的栈空间,共该函数局部变量使用)
  6. push 寄存器(保存进入该函数前的寄存器的值,保证函数调用完后可以恢复原始值)
  7. 执行函数代码
  8. pop 寄存器
  9. mov esp, ebp(降低栈顶指针,释放专属该函数的栈空间)
  10. pop ebp(降低栈底指针,恢复成原始值)
  11. retn(pop eip,此时栈顶指针指向的位置刚好保存着进入函数之前的下一条语句的地址)
  12. add esp, xxx(由于调用函数前push参数导致栈顶指针提升,因此调用完要释放掉参数栈空间)

问题:

  • 为什么要进行堆栈平衡?

因为Windows操作系统应用层堆栈大小默认是1M,每次调用函数都会开辟一段堆栈空间,用完如果不平衡,调用几次函数就凉凉了

  • 函数的返回值放哪里了?

大部分情况返回值会放到eax中,但不是绝对的。

  • 传参只能通过push到堆栈的方式吗?

也可以通过寄存器传参。

函数调用约定:

  • __cdecl: C/C++里中默认调用方式

特点1:push参数,顺序从右往左。
特点2:外部平衡堆栈。(即在retn之后(函数外部)释放参数栈空间)

  • __stdcall:windows API函数的调用方式 用了WINAPI的宏进行代替

特点1:push参数,顺序从右往左。
特点2:内部平衡堆栈。(即在retn之前(函数内部)释放参数栈空间)

  • __fastcall:快速调用方式 这种方式选择将参数优先从寄存器传入

特点1:寄存器传参,edx,ecx (如果参数多于2个,则用push),顺序从右向左
特点2:内部平衡堆栈。(即在retn之前(函数内部)释放参数栈空间)

  • X64程序变化:
  • 32位程序中的函数调用约定被废除了
  • x64应用程序只有1种调用约定 类似于fastcall。前4个参数使用寄存器传递,如果参数超过4个,多余的参数就放在栈里,人栈顺序为从右到左,由函数调用方平衡栈空间。前4个参数存放的寄存器是固定的,分别是第1个参数RCX、第2个参数RDX、第3个参数R8、第4个参数R9,其他参数从右往左依次人栈。
  • x64应用程序调用函数时参数会放到rsp-rbp段,局部变量会存放到rbp+xxx中。而32位程序中参数会放到ebp+xxx段,局部变量会放到esp-ebp段中。

Guess you like

Origin blog.csdn.net/Dajian1040556534/article/details/129617880