Instruction Set
save&load
-
IMM
Full name load immidiatily
Immediately load data into the register
-
LEA
load effective address
load address
-
LC/LI/SC/SI
load char/load int: load char and int into registers
save char/save int: load char and int from register to memory
-
PUSH
Push the data of the register to the top of the stack stack peek
举例
-
ax is a general purpose register
-
pc is the code area pointer program counter
Points to the currently executing command
The direct operands behind these instructions will also exist in the code area
For example, IMM 2 loads the constant (literal value) of the number 2 into the ax register
After parsing this instruction, put it into the code area
When the IMM is read, the PC will move down to the position of 2
The op variable in the code will be loaded with the instruction IMM just read
ax = (dereference) the content of the actual address is 2
put it in the ax register
There will be a post pc++ operation that continues to move to the next step
The function of the LEA instruction in x86, for example:
There is a register bx which contains the address of a memory
bx=430430
I want to calculate the address shifted 8 bits behind this address
The address of bx+8 is stored in ax
If there is no LEA, it needs to be implemented like this:
ADD bx 8 mov bx ax and then transfer bx data to ax
There is a problem with doing this:
1. Changed the original value of bx
2. The addition operation may cause overflow
It is possible that the result of the addition will be greater than the 32-bit value
It will change the data of these flags of CF OF
This will bring more complexity to the subsequent calculations
So to simplify by LEA operations like
LEA ax (bx+8)
In our implementation of LEA it is not based on any location to calculate the address
Because it is not necessary to implement
Mainly want to calculate the address based on the stack
At most, it is used to read parameters of functions and parameters of local variables
They are all based on the location of the bp of the stack
So our LEA is based on bp to add and subtract
It is possible to get the number of parameters or the number of local variables of this function
For example, LEA-1 is to take the position of bp in the stack and minus 1 is actually adding 1, because the stack is from large to small, minus 1 is actually the next position of bp. This position may be the first local variable of this function. For example, it is 35 Take out the address of this location and take the value inside is 35
LEA is actually based on the address of bp to calculate its relative position
Continue to analyze the code
Next is the load operation
ax itself is equipped with an address
Load the value of the memory corresponding to this address into ax
i.e. ax=*ax
Because of different types, different pointers will be used for coercion
sc will store the data on the top of the stack, sp points to an address stored in the top of the stack
Convert this address to the corresponding pointer
For example, a char pointer (char*)
Then take a deference to this address
Get the space where this address is located
Store the data of this ax in this space
After the top of the stack is used up, sp will increase
The stack will go back one space
That is, when the stack is used up, it will return to the previous one.
Suppose the address stored in this stack is 430430
Suppose it is a location in the data area
sp points to this address first find 430430
Take it out and convert it into an int pointer
Points to a location in the data area
Then take a dereference to it and get this space
Assuming ax=1, at this time, the data at address 430430 will be set to 1
This completes saving the data in ax to the location 430430 in the data area of the memory space
Continue to look at the code
The push command, for example, loads a data 2 in ax, and pushes the data onto the stack: the sp pointer first -- then puts the data 2 into the stack
Operation related instructions
Arithmetic
四则运算
ADD/SUB/MUL/DIV
MOD
model
bit operation
OR
or
XOR
XOR
AND
and
SHL/SHR
move left/right
logic operation
EQ
equal
NQ
range
LT/LE/GT/GE
less than / less than or equal to / greater than / greater than or equal to
branch jump instruction
JMP相关指令
-
JMP jumps the pc pointer to a specified code area
Suppose 430430 is an address in the code area
jmp 430430, the pc pointer will move directly to the address of 430430 and skip the intermediate instructions
-
JZ/JNZ judges whether to jump and where to jump based on the current value of the register
JZ is to judge whether ax is equal to 0, if it is equal to 0, do JMP, if it is not equal to 0, then execute the next instruction, pc will directly add 1
JNX is the opposite
for example
while(a>b){
...
}
How can such a while loop be implemented with the simplest vm command?
Assuming that the value of a>b expression has been realized is either 0 or 1 and the result is saved in ax
Define two positions in the code area, one is the loop point and the other is the end point
Then the JZ instruction judges whether ax is equal to 0, if it is equal to 0, then JMP to the end point
If it is not equal to 0, execute the next statement directly
After the code of the loop body is executed, it is necessary to loop back to the beginning position and there is a JMP loop point
give another example
if(a>b){
...
}else{
...
}
There are 3 anchor points in the code area true point, false point, end point
If true, execute from true point
If false, execute from false point
The execution of true to the false point position also needs to skip the instruction after the false point
jump to end point
First, you need to calculate the value of a>b and store it in ax
When it is true, execute directly without jumping
When it is not true, JZ false point jumps to false point
Execution is over
If the execution logic of ture is executed, go to the JMP end point
Jump directly to the end point
Then execute the statement after the if
With the two branch judgment instructions of while and if, it can be realized without involving functions
Turing complete calculation
为什么要有statck(code和data空间)
How to implement function calls if there is no stack
int add(int a,int b){
int ret;
ret = a+b;
reuturn ret;
}
int main(){
int a=3;
int b=4;
int ret=add(a,b);
return 0;
}
The main function calls the add function code to execute from the main function
Need to know these few information
a. The address of the function to be called
The position of the function in the code area when it is translated into assembly
b. You need to know the value of the passed parameter during execution
The local variables and parameter values in the add function are useless after the function is executed, and then find the return address and return to the calling place.
The key is to save the local variables a, b and the return address during the execution of the function
The add function first stores 3 in the data area, then stores 4, and then stores the return address (for example, the 430430 position in the code area)
After the function call ends, delete the data area and clear the data at address 430430
The data area will have space for different functions in different places
There must be a unified place to manage these spaces
The logical concept of this place can be defined as a hash table
It can quickly locate a place every time a function is called
For example, when the add method is called and I want to return to the main method
The return value of these parameter values of mian must be restored
You have to find this space to check in the hash table
This solution is also possible, but it will make the maintenance of the data area very complicated
Any complex thing in the computer can be solved by adding an intermediate layer or abstraction layer
The last call of the function in the process of calling is also the most recently called
The function being executed is the first to release
I return to the place that called me after I execute
is a last-in-first-out process
Then you will think of stack
Therefore, stack is generally used to describe the local space of the function.
The return address is saved in the add stack
Parameters are generally placed at the bottom of the main stack and can also be defined in the add stack
The stack base base point of the main stack is bp
local variable ret
The last is the top of the stack
When the add call ends, return directly to the bp position
All local variables are no longer needed
Then put the bp in the add stack into the bp in the main stack
return value to pc
Then there will be a jump in the code area
This is the process of a function call
Abstraction of a space via a LIFO
Great simplification
Memory relationship between different function stacks
But it's not a must-have concept, it's just that the concept greatly simplifies maintenance costs
是否一定要有函数调用呢
There is a Brain-Fuck language <>+=[],.
There are all these 8 symbols in the whole code, and the readability is 0
Reading this code requires painful calculations in your brain
So called Brain-Fuck
This language actually assumes that there are 2 paper tapes
The first paper tape contains the language itself is the code
The second tape contains the data
There is a probe only at a certain starting position in this data
The less than sign is to move the probe to the left by one space
The greater than sign means that the probe moves one space to the right
The plus sign is where the probe points to the data
The default value is 0, and the plus sign is to add 1 to it to become 1
The minus sign means that minus 1 becomes 0.
The left parenthesis means that the code area will judge whether to jump according to a condition
The condition is whether the position pointed by your current probe is equal to 0
If not equal to 0, execute directly
If it is equal to 0, jump to the corresponding closing bracket
The right parenthesis means that when the position data pointed by the pointer is equal to 0
it just continues execution
If it is not equal to 0, jump to the corresponding left parenthesis
The left parenthesis is equivalent to the JZ of the defined virtual machine instruction
Closing parenthesis is equivalent to JNZ
The comma is to input a number from the IO device
A period is to output a number
How to add two numbers based on such a simple grammar
First comma enter a number probe at the initial position
It is the first position in the data area
For example, the input is 3
Probe right
Then enter a number such as 4
then a left parenthesis
Determine whether the value pointed by the probe is 0
If it is not 0, do not jump and proceed to the next bit
The next bit is to move the probe to the left and move back
Add 1 to it and 3 becomes 4
Probe moves right after adding 1
After the right shift the next code is to subtract
4-1=3
At this time, the probe is at the position of 3 and then judges whether it is 0
If it is not 0, jump to the corresponding left parenthesis
Then continue to perform left shift
After shifting to the left, continue to add 4 to become 5
Shifting to the right means subtracting 3 to become 2
It is a loop jump until it is reduced to 0
After it is 0, it will not jump (JNZ)
Execution of the next code EOF is over
The final effect is to add 4 to 3 to get 7
This is how the brain fuck language implements an addition process
This language is currently the closest to a Turing machine
So these complex operations can be done without function calls
Simplified by function calls
But there is no concept of stack in brain fuck, if you want to implement function calls, it is quite complicated
If there is an extra paper tape, it is easy to implement function calls
With the function call, the coding of the entire main logic will become very simple
So I understand why there are partition designs such as code area, data area, and stack area
c. After execution, you need to know the return value of the return result
After the calculation, store the result in a certain register, for example, it is agreed to store it in the general-purpose register ax
When returning to the calling place, the value in ax is the return value of the called function
Then assign ax to ret
d. Return address
函数调用(跳转)相关指令
-
CALL -
RETURN -
NVAR:new statck frame for variable -
DARG:delete statck frame for argument
Or take the above method of main calling add as an example
First, main needs to prepare two parameters before calling
Suppose here is the stack area from big to small
Regardless of the previous main stack
But the next 2 places are used to store 2 parameters
Suppose one is 3 and the other is 4
Next call the add method
The situation of the code area
It's time to call
The main code area and the add code area are continuous code areas in the code area and are drawn separately for the convenience of description
The address of the add code area is 430430
The main code area is Call 430430
After the call is finished, a cleanup will be done on this place DARG2
Suppose the address of the call location is 430420
The parameter will also occupy an address 430424
The address that needs to be returned is 430428
After executing the add method, you need to return the pc register to 430428
Execute the next code
After the call, the PC will jump to the position of 430430
pc will be equal to the number in the address 430430
Then move the sp down one bit
The result is saved to the address 430428
Store the return address in
When returning after executing add
Through this place, you can know where to return
Next, look at the vm instruction in the add function
First, you need to make nvar to apply for some initial space of the stack frame for the local variables of the function
This initial space must first store the bp address
Save the old bp address
Because once you jump and enter this new stack, the new position is bp
The old bp is actually the main bp and it has to be saved
Because I want to restore the original appearance of this stack later
First, you can restore the position of the code to the previous state
The second can restore the state of the stack to the state before the call
So recording the old bp is for calling the end of the recovery
Then add some space to the stack
Store local variables The local variable here has only one ret
Then execute ret = a+b
First, save part of the ret address of the assignment
After the assignment is completed, there is a place to store the result
So you need to take this address out
Since it is the first local variable, it is the position of the stack base minus 1
So get its address out via LEA -1
Stored in the so-called ax register
That is, after the execution is completed, the ax register will be the address of ret
Store this address in the stack so there will be a push operation
temporarily stored in the stack
Suppose it is 330330
At this time, ax is free and can be used for calculation
The add command is to add the content in ax to the content of stack peek (stack top) and then save it back to ax
and destroy the top of the stack
take a out first
its position in the upper three squares based on bp
So it's LEA 3
Get this address and load it into ax
Then add an integer LI
Load the ax data into the stack
The top of the stack becomes 3
Then the second parameter is the upper two digits of bp
After getting this address, you can load the data into ax
ax is equal to 4
At this time, the top of the stack is 3 and ax is 4
Call the ADD method to destroy the top of the stack
and add it to 4
ax is equal to 7
At this point the addition is complete and then the assignment statement
find ret address
Because the top of the stack has been destroyed, sp is here
Then store the ax data in the address corresponding to sp, which is the SI (save integer) method
After the save is over, the stack where the address is stored will also be destroyed
Then write 7 in
Then return this variable
is the position of the first local variable bp-1
Then continue to load the position of bp-1 into ax
Because the return value is in ax, you have to lea bp-1
The content of ret returns to ax
then return
look what retuan did
First sp=bp
These spaces are destroyed no matter how many local spaces there are
bp is equal to the address in sp
The specific value in sp is main bp
So bp is back to old bp position
sp+1 so the new bp is also destroyed
Then pc jumps back here
The location of 430428
That is, the position after calling the call
darg2(delete stack frame for args)
This space is made for the parameter, but this space is deleted
Then sp continues to add 430428 to be destroyed
If the parameter of delete is 2, it will be deleted by the two parameters of 3 and 4.
Then the sp returns to this position
like nothing ever happened
Native-Call
Here is the complete copy from c4
IO related instructions
OPEN/CLOS/READ/
为什么没有write
Because c4 design Native-CALL is mainly to complete the bootstrap c4 source code does not use the write method
PRTF: write data to standard output (fd=1)
Dynamic Memory Related Instructions
-
MALC
-
FREE
-
MSET
-
MCMP
-
EXIT is used to terminate the program