Assembly language study notes 01-basic knowledge of assembly

introduction

Assembly language is the lowest level programming language besides machine language. Since the machine can only read 0 and 1, it needs to be converted into a language that is easier for humans to operate (in fact, the design of high-level languages such as c also includes this).

At the same time, assembly language is also a bridge for many other high-level languages to the machine. For example, C language needs to be written and then compiled into assembly language, and then converted into machine language.

In this case, I can naturally think, can I use some decompilation methods to crack some software or other (reverse engineering)?

In fact, this is not the case. Only machine language and assembly language have a one-to-one correspondence, and high-level languages are likely to represent a lot of assembly language operations in one sentence, so the forward direction is easy and the reverse direction is easy to make mistakes. For example, a program written in C language cannot be directly cracked by the source code. At this time, you need to use assembly language (hackers favorite)

In short, because assembly and machines are very closely related, learning assembly language allows us to better understand the operation of the hardware and how to make the hardware work more efficiently.

This blog is mainly based on the fourth edition of assembly language and interface technology, as well as the Little Turtle Net class (address: fishc.com). I am a junior in computer science, and I am learning assembly language for the first time. The notes are only suitable for beginners to learn from.

1.1 The production of assembly language

Machine language is a collection of machine instructions;
the main body of assembly language is also assembly instructions.

The difference between the two instructions lies in the way of expression. In general, assembly instructions are written in a format that is easy for us to remember based on machine instructions. By definition, assembly instructions are called mnemonics for machine instructions.

Here is an example: For example,
the content of an operation is: move the contents of the register AX to the register BX.

It is written in machine language like this: 1000100111011000

Don’t you understand? Then the assembly instructions are formatted and written into assembly language:
MOV BX, AX

The meaning of mov is move. Generally, three letters are taken, which is easy to understand. It is worth noting that the latter will be moved to the front, and the specific operation will be discussed later.

register

We just mentioned the AX BX registers. What are these?

Simply put, it can be understood as the memory in the cpu (note that it is distinguished from the first and second cache). There are various registers in a cpu, such as flag registers, data registers, and so on. Each has its own task. At present, we only need to know what they are storing data and instructions.
And AX and BX are temporarily understood as the code name of the register (this is what the little turtle said)

1.2 The composition of assembly language

Assembly language is mainly composed of the following three parts:

Assembly instructions (ie mnemonics in machine language) are also the core of assembly language . For example, MOV add, the subsequent operands are also put together.
First put a picture of the 8086 instruction format, and you can understand it slowly.
Pseudo instructions.
Pseudo-instructions should be distinguished from assembly instructions. Assemble instructions can produce corresponding machine code for machine execution, but pseudo-instructions do not. They are mainly used to assist the assembly of source programs. For example, specify the data storage address and so on. These instructions are not recognized by the machine and are recognized and executed by the compiler.
Specific examples can refer to this https://blog.csdn.net/houyichaochao/article/details/80686076
Other symbols.
For example, addition, subtraction, multiplication and division are the same as pseudo-instructions, which require the compiler to recognize and execute them.

1.3 Memory

Most of the components in the computer have their own memory. The main thing we learn here is memory .

The CPU is responsible for controlling the operation and calculation of the entire computer, but we cannot put data directly into the CPU (because there are too many, and the cpu is so expensive), so we generally choose hard disk storage. But the CPU can't read data directly from the hard disk. At this time, memory is needed .

The memory needs to store instructions and data that the CPU needs to process. Instructions mean to tell the CPU what to do, and data tells the CPU what to do.

In the memory or hard disk, there is no difference between data and instructions. They are all binary codes.
For example:
1000100111011000 —> 89D8H (data)
1000100111011000 —> MOV AX, BX (instruction)
which one is actually represented is artificial.

1.4 Storage unit

The memory is divided into several storage units, and each storage unit is numbered starting from 0. For example, if a memory has 128 storage units, their numbers are 0~127.

One storage unit can store 1Byte. The 1KB memory has 1024 memory cells from 0 to 1023.

1.5 CPU reads and writes to memory-bus

As we said earlier, the CPU itself does not store so much data, but obtains it from memory. There are three main types of things it obtains:

Address of the storage unit (address information)
Device selection and read and write commands (control information)
Read and write data (data information)

So how exactly is this information transmitted?

We should have all heard of electrical signals, as follows:

Level pulse

The signals that an electronic computer can process and transmit are all electrical signals, so they must be transmitted by wires.

If you have had the experience of installing a computer by yourself, the most impressive thing must be the plug-in process (it is too much trouble...). The wires you plug in are part of the bus .

The bus is logically divided into:

Address bus
Data Bus
Control bus

It also corresponds to the three kinds of information mentioned above.

Picture from Little Turtle Net Lesson

read

Look carefully, in the address line, the CPU sends the address of the object it needs to operate to the memory, telling the memory that I need something in this address.

read
Then in the control line, the CPU sends the memory read information, telling it that the operation I need to do now is read.

read

Then the memory knows what it needs to do, and it passes the data with address number 3 to the CPU.

As for the write instruction, it is the same as read, except that the data is from the CPU to the memory.
write

Do you remember what I mentioned above, saying that instructions and data in memory cannot be distinguished. Two identical machine codes can represent one operation or a string of data. At that time, we only said that we can artificially identify what it represents, but now we know which line it represents is which category it represents.

If you have seen a CPU chip, you must have seen the yellow pins (also cross pins), which are connected to the bus. It can also be said that these pins lead to the bus.

The width of the three buses that a CPU can draw represents its performance in three aspects:

The width of the address bus determines the addressing ability of the CPU
The width of the data bus determines the amount of data transferred at one time
The width of the control bus determines its ability to control other components

1.5.1 Address bus

From the above figure, we can know that the address bus is a line used to transmit addresses. Here we talk a little bit more.

You must all know that today's computers are generally 64 or 32 bits, but this can be understood as a dual concept, that is, the number of bits of the operating system and the number of bits of the CPU. The two are related. Simply put, if the CPU is 64-bit and the operating system is lower than that, the performance of the CPU cannot be fully utilized.

For this parameter of the CPU, it represents the addressing ability. For example, a 64-bit processor can address 2 to 64 addresses. In theory, there is one wire per address line, so it looks like there are 64 in 64-bit processors, but in reality it may not be.

To be more accurate, if a CPU has N address lines, it can be said that the width of the address bus of this CPU is N, and it can look for 2 to the Nth power of memory units. How many different information can be transmitted on the address bus , the CPU can address how many memory cells.

1.5.2 Data Bus

The data bus also has a width, which determines the speed of data transmission.

Just like a highway, the wider the highway, the more cars will be driven per unit time (because the speed is the same, it seems to be called the oscillation frequency).

It can also be understood in terms of digits.

Such as 8086 processor, it is a 16-bit data bus.
If we want to transmit some data, its binary code is 1101100010001001 exactly 16 bits, we can transmit it all at once:

8086

If it is an 8088 processor, because this is an 8-bit CPU, it needs to be passed twice. Attention is from low to high.

This is also the reason why the increase in the number of bits is faster. Think about the current 64-bit CPU

1.5.3 Control Bus

The control bus is a general term for control lines, which are mainly used by the CPU to control other devices.

So how many control lines there are means how many kinds of control the CPU can control other devices.

What you need to know is that the read and write commands mentioned earlier are issued by several control lines. The specific teaching materials do not seem to require mastery, so I won't repeat them.

At the end of this chapter, note that the above content is all theoretical and will not discuss the actual situation.