80x86 Assembly Language Basics

Let’s make an excerpt today.
The original text comes from WeChat public account: Computer and Network Security

Assembly language is a low-level language used in electronic computers, microprocessors, microcontrollers or other programmable devices, also known as symbolic languages. In assembly language, mnemonics (Mnemonics) are used to replace the opcodes of machine instructions, and address symbols (Symbols) or labels (Label) are used to replace the addresses of instructions or operands. In different devices, assembly language corresponds to different machine language instruction sets, and is converted into machine instructions through the assembly process . Generally speaking, there is a one-to-one correspondence between a specific assembly language and a specific machine language instruction set, and it is not directly portable between different platforms.

Many assemblers provide additional support mechanisms for program development, assembly control, and aided debugging. Some assembly language programming tools often provide macros, they are also called macro assemblers.

Assembly language is not as widely used for programming as most other programming languages. In today's practical applications, it is usually used in low-level, hardware operation and high-demand program optimization occasions. Assembly language is required for drivers, embedded operating systems, and real-time programs.

0x01 Development History

When it comes to the production of assembly language, we must first talk about machine language.

Machine language is a collection of machine instructions. Machine instructions are the commands that a machine can execute correctly. The machine instructions for an electronic computer are a list of binary numbers. The computer converts it into a series of high and low levels, so that the electronic devices of the computer are driven to perform operations.

The computer mentioned above refers to a machine that can execute machine instructions and perform operations. This is the concept of early computers. In our commonly used PC, there is a chip to complete the functions of the computer mentioned above. This chip is what we often call the CPU (Central Processing Unit). Each microprocessor, due to the different hardware design and internal structure, needs to be controlled by different level pulses to make it work. So each microprocessor has its own machine instruction set, that is, machine language.

Early programming used machine language. The programmers punch the program code made up of 0 and 1 numbers on the paper tape or card, 1 punch, 0 do not punch, and then input the program into the computer through the paper tape machine or card machine for calculation. Such machine language is composed of pure 0s and 1s, which is very complex, inconvenient to read and modify, and prone to errors. Programmers quickly discovered the troubles of using machine languages, which were difficult to identify and remember, and brought obstacles to the development of the entire industry, so assembly language was born.

Picture and text have nothing to do
The body of assembly language is assembly instructions. The difference between assembly instructions and machine instructions lies in the way the instructions are represented. Assembly instructions are written formats for machine instructions that are easy to remember.

1000100111011000	机器指令
mov ax,bx			汇编指令

Since then, programmers have used assembly instructions to write source programs. However, only machine instructions can be read by a computer, so how can a computer execute a program written by a programmer with assembly instructions? At this time, a translation program that can convert assembly instructions into machine instructions is required, and such a program is called a compiler. The programmer writes the source program in assembly language, and then uses the assembly compiler to compile it into machine code, which is finally executed by the computer.

0x02 Language Features

Assembly language is a programming language directly oriented to processors. The processor works under the control of instructions, and each instruction that the processor can recognize is called a machine instruction . Each type of processor has its own set of instructions that it recognizes, called the instruction set. When the processor executes an instruction, it takes different actions according to different instructions and completes different functions, which can not only change its own internal working state, but also control the working state of other peripheral circuits.

Another feature of assembly language is that the objects it operates are not specific data, but registers or memory, which means that it deals directly with registers and memory , which is why assembly language execution speed is faster than other languages, but It also makes programming more complicated. For example, in the above example, we cannot directly use the data like a high-level language, but first take out the data from the corresponding registers AX and BX. This also increases the complexity of programming, because addressing this part of the work in high-level languages ​​is done by the compilation system, while in assembly language it is done by the programmers themselves, which increases the complexity of programming. degree.

Furthermore, assembly language instructions are a kind of symbolic representation of machine instructions, and different types of CPUs have different machine instruction systems, and thus have different assembly languages . Therefore, assembly language programs are closely related to machines. Therefore, in addition to a certain degree of portability of assembly language programs between CPUs of the same series and different models, assembly language programs between other CPUs of different types (such as minicomputers and microcomputers, etc.) cannot be ported. That is, assembly language programs are less versatile and portable than high-level language programs.

0x03 language composition

data transmission

This part of the instructions includes the general data transfer instruction MOV, the conditional transfer instruction CMOVcc, the stack operation instruction PUSH/PUSHA/PUSHAD/POP/POPA/POPAD, the exchange instruction XCHG/XLAT/BSWAP, the address or segment descriptor selector transfer instruction LEA/LDS /LES/LFS/LGS/LSS etc. Note that CMOVcc is not a specific instruction, but an instruction cluster, including a large number of instructions, which are used to decide whether to perform a specified transfer operation according to certain bit states of the EFLAGS register.

Integer and logical operations

This part of the instructions is used to perform arithmetic and logical operations, including addition instructions ADD/ADC, subtraction instructions SUB/SBB, plus one instruction INC, minus one instruction DEC, comparison operation instruction CMP, multiplication instruction MUL/IMUL, and division instruction DIV/IDIV , sign extension instruction CBW/CWDE/CDQE, decimal adjustment instruction DAA/DAS/AAA/AAS, logical operation instruction NOT/AND/OR/XOR/TEST, etc.

shift instruction

This part of the instruction is used to move a register or memory operand a specified number of times. Including logical left shift instruction SHL, logical right shift instruction SHR, arithmetic left shift instruction SAL, arithmetic right shift instruction SAR, circular left shift instruction ROL, circular right shift instruction ROR, etc.

bit manipulation instructions

Some of the instructions include bit test instruction BT, bit test and set instruction BTS, bit test and reset instruction BTR, bit test and negate instruction BTC, bit forward scan instruction BSF, bit backward scan instruction BSR, etc.

Condition setting instruction

This is not a specific instruction, but an instruction cluster, including about 30 instructions, used to set an 8-bit register or memory operand based on the state of some bits in the EFLAGS register. Such as SETE/SETNE/SETGE and so on.

control transfer instruction

This part includes unconditional transfer instruction JMP, conditional transfer instruction Jcc/JCXZ, loop instruction LOOP/LOOPE/LOOPNE, procedure call instruction CALL, sub-procedure return instruction RET, interrupt instruction INTn, INT3, INTO, IRET, etc. Note that Jcc is an instruction cluster that contains many instructions to decide whether to transfer or not according to some bit states of the EFLAGS register; INT n is a soft interrupt instruction, and n can be a number between 0 and 255 to indicate an interrupt Vector number.

For example, INT 13 interrupt is used for disk read and write.

String manipulation

This part of the instructions is used to operate the data string, including the string transfer instruction MOVS, the string compare instruction CMPS, the string scan instruction SCANS, the string load instruction LODS, and the string save instruction STOS. These instructions can selectively use REP/REPE/REPZ/ The prefixes REPNE and REPNZ operate consecutively.

I/O instructions

This part of the commands is used to exchange data with peripheral devices, including port input commands IN/INS and port output commands OUT/OUTS.

high-level language assistance

This part of the instructions provides convenience for the compiler of the high-level language, including the instruction ENTER for creating a stack frame and the instruction LEAVE for releasing the stack frame.

Control and Privilege Instructions

This part includes no operation instruction NOP, stop instruction HLT, wait instruction WAIT/MWAIT, escape instruction ESC, bus block instruction LOCK, memory range check instruction BOUND, global descriptor table operation instruction LGDT/SGDT, interrupt descriptor table operation instruction LIDT/SIDT, local descriptor table operation instruction LLDT/SLDT, descriptor segment limit value load instruction LSR, descriptor access right read instruction LAR, task register operation instruction LTR/STR, request privilege level adjustment instruction ARPL, task switch flag Clear instruction CLTS, control register and debug register data transfer instruction MOV, cache control instruction INVD/WBINVD/INVLPG, model-dependent register read and write instruction RDMSR/WRMSR, processor information acquisition instruction CPUID, timestamp read instruction RDTSC et al.

floating point and multimedia instructions

This part of the instructions is used to accelerate the operation of floating-point data, and the single-instruction-multiple-data (SIMD and its extension SSEx) instructions for accelerating multimedia data processing. This part of the instruction data is very large and cannot be listed one by one. Please refer to the INTEL manual by yourself.

virtual machine extension

This part of the instructions includes INVEPT/INVVPID/VMCALL/VMCLEAR/VMLAUNCH/VMRESUME/VMPTRLD/VMPTRST/VMREAD/VMWRITE/VMXOFF/VMON, etc.

Guess you like

Origin blog.csdn.net/weixin_43466027/article/details/117430012