Virtual machine instruction set & stack and function calls

Instruction Set

save&load

Full name load immidiatily

Immediately load data into the register

load effective address

load address

LC/LI/SC/SI

load char/load int: load char and int into registers

save char/save int: load char and int from register to memory

PUSH

Push the data of the register to the top of the stack stack peek

举例

ax is a general purpose register
pc is the code area pointer program counter

Points to the currently executing command

The direct operands behind these instructions will also exist in the code area

For example, IMM 2 loads the constant (literal value) of the number 2 into the ax register

After parsing this instruction, put it into the code area

When the IMM is read, the PC will move down to the position of 2

The op variable in the code will be loaded with the instruction IMM just read

ax = (dereference) the content of the actual address is 2

put it in the ax register

There will be a post pc++ operation that continues to move to the next step

The function of the LEA instruction in x86, for example:

There is a register bx which contains the address of a memory

bx=430430

I want to calculate the address shifted 8 bits behind this address

The address of bx+8 is stored in ax

If there is no LEA, it needs to be implemented like this:

ADD bx 8 mov bx ax and then transfer bx data to ax

There is a problem with doing this:

1. Changed the original value of bx

2. The addition operation may cause overflow

It is possible that the result of the addition will be greater than the 32-bit value

It will change the data of these flags of CF OF

This will bring more complexity to the subsequent calculations

So to simplify by LEA operations like

LEA ax (bx+8)

In our implementation of LEA it is not based on any location to calculate the address

Because it is not necessary to implement

Mainly want to calculate the address based on the stack

At most, it is used to read parameters of functions and parameters of local variables

They are all based on the location of the bp of the stack

So our LEA is based on bp to add and subtract

It is possible to get the number of parameters or the number of local variables of this function

For example, LEA-1 is to take the position of bp in the stack and minus 1 is actually adding 1, because the stack is from large to small, minus 1 is actually the next position of bp. This position may be the first local variable of this function. For example, it is 35 Take out the address of this location and take the value inside is 35

LEA is actually based on the address of bp to calculate its relative position

Continue to analyze the code

Next is the load operation

ax itself is equipped with an address

Load the value of the memory corresponding to this address into ax

i.e. ax=*ax

Because of different types, different pointers will be used for coercion

sc will store the data on the top of the stack, sp points to an address stored in the top of the stack

Convert this address to the corresponding pointer

For example, a char pointer (char*)

Then take a deference to this address

Get the space where this address is located

Store the data of this ax in this space

After the top of the stack is used up, sp will increase

The stack will go back one space

That is, when the stack is used up, it will return to the previous one.

Suppose the address stored in this stack is 430430

Suppose it is a location in the data area

sp points to this address first find 430430

Take it out and convert it into an int pointer

Points to a location in the data area

Then take a dereference to it and get this space

Assuming ax=1, at this time, the data at address 430430 will be set to 1

This completes saving the data in ax to the location 430430 in the data area of the memory space

Continue to look at the code

The push command, for example, loads a data 2 in ax, and pushes the data onto the stack: the sp pointer first -- then puts the data 2 into the stack

Operation related instructions

Arithmetic

四则运算

ADD/SUB/MUL/DIV

MODmodel

bit operation

ORor

XORXOR

ANDand

SHL/SHRmove left/right

logic operation

EQequal

NQrange

LT/LE/GT/GEless than / less than or equal to / greater than / greater than or equal to

branch jump instruction

JMP相关指令

JMP jumps the pc pointer to a specified code area

Suppose 430430 is an address in the code area

jmp 430430, the pc pointer will move directly to the address of 430430 and skip the intermediate instructions

JZ/JNZ judges whether to jump and where to jump based on the current value of the register

JZ is to judge whether ax is equal to 0, if it is equal to 0, do JMP, if it is not equal to 0, then execute the next instruction, pc will directly add 1

JNX is the opposite

for example

while(a>b){
    
    
...
}

How can such a while loop be implemented with the simplest vm command?

Assuming that the value of a>b expression has been realized is either 0 or 1 and the result is saved in ax

Define two positions in the code area, one is the loop point and the other is the end point

Then the JZ instruction judges whether ax is equal to 0, if it is equal to 0, then JMP to the end point

If it is not equal to 0, execute the next statement directly

After the code of the loop body is executed, it is necessary to loop back to the beginning position and there is a JMP loop point

give another example

if(a>b){
    
    
...
}else{
    
    
...
}

There are 3 anchor points in the code area true point, false point, end point

If true, execute from true point

If false, execute from false point

The execution of true to the false point position also needs to skip the instruction after the false point

jump to end point

First, you need to calculate the value of a>b and store it in ax

When it is true, execute directly without jumping

When it is not true, JZ false point jumps to false point

Execution is over

If the execution logic of ture is executed, go to the JMP end point

Jump directly to the end point

Then execute the statement after the if

With the two branch judgment instructions of while and if, it can be realized without involving functions

Turing complete calculation

为什么要有statck(code和data空间)

How to implement function calls if there is no stack

int add(int a,int b){
    
    
   int ret;
   ret = a+b;
   reuturn ret;
}

int main(){
    
    
   int a=3;
   int b=4;
   int ret=add(a,b);
   return 0;
}

The main function calls the add function code to execute from the main function

Need to know these few information

a. The address of the function to be called

The position of the function in the code area when it is translated into assembly

b. You need to know the value of the passed parameter during execution

The local variables and parameter values in the add function are useless after the function is executed, and then find the return address and return to the calling place.

The key is to save the local variables a, b and the return address during the execution of the function

The add function first stores 3 in the data area, then stores 4, and then stores the return address (for example, the 430430 position in the code area)

After the function call ends, delete the data area and clear the data at address 430430

The data area will have space for different functions in different places

There must be a unified place to manage these spaces

The logical concept of this place can be defined as a hash table

It can quickly locate a place every time a function is called

For example, when the add method is called and I want to return to the main method

The return value of these parameter values of mian must be restored

You have to find this space to check in the hash table

This solution is also possible, but it will make the maintenance of the data area very complicated

Any complex thing in the computer can be solved by adding an intermediate layer or abstraction layer

The last call of the function in the process of calling is also the most recently called

The function being executed is the first to release

I return to the place that called me after I execute

is a last-in-first-out process

Then you will think of stack

Therefore, stack is generally used to describe the local space of the function.

The return address is saved in the add stack

Parameters are generally placed at the bottom of the main stack and can also be defined in the add stack

The stack base base point of the main stack is bp

local variable ret

The last is the top of the stack

When the add call ends, return directly to the bp position

All local variables are no longer needed

Then put the bp in the add stack into the bp in the main stack

return value to pc

Then there will be a jump in the code area

This is the process of a function call

Abstraction of a space via a LIFO

Great simplification

Memory relationship between different function stacks

But it's not a must-have concept, it's just that the concept greatly simplifies maintenance costs

是否一定要有函数调用呢

There is a Brain-Fuck language <>+=[],.

There are all these 8 symbols in the whole code, and the readability is 0

Reading this code requires painful calculations in your brain

So called Brain-Fuck

This language actually assumes that there are 2 paper tapes

The first paper tape contains the language itself is the code

The second tape contains the data

There is a probe only at a certain starting position in this data

The less than sign is to move the probe to the left by one space

The greater than sign means that the probe moves one space to the right

The plus sign is where the probe points to the data

The default value is 0, and the plus sign is to add 1 to it to become 1

The minus sign means that minus 1 becomes 0.

The left parenthesis means that the code area will judge whether to jump according to a condition

The condition is whether the position pointed by your current probe is equal to 0

If not equal to 0, execute directly

If it is equal to 0, jump to the corresponding closing bracket

The right parenthesis means that when the position data pointed by the pointer is equal to 0

it just continues execution

If it is not equal to 0, jump to the corresponding left parenthesis

The left parenthesis is equivalent to the JZ of the defined virtual machine instruction

Closing parenthesis is equivalent to JNZ

The comma is to input a number from the IO device

A period is to output a number

How to add two numbers based on such a simple grammar

First comma enter a number probe at the initial position

It is the first position in the data area

For example, the input is 3

Probe right

Then enter a number such as 4

then a left parenthesis

Determine whether the value pointed by the probe is 0

If it is not 0, do not jump and proceed to the next bit

The next bit is to move the probe to the left and move back

Add 1 to it and 3 becomes 4

Probe moves right after adding 1

After the right shift the next code is to subtract

4-1=3

At this time, the probe is at the position of 3 and then judges whether it is 0

If it is not 0, jump to the corresponding left parenthesis

Then continue to perform left shift

After shifting to the left, continue to add 4 to become 5

Shifting to the right means subtracting 3 to become 2

It is a loop jump until it is reduced to 0

After it is 0, it will not jump (JNZ)

Execution of the next code EOF is over

The final effect is to add 4 to 3 to get 7

This is how the brain fuck language implements an addition process

This language is currently the closest to a Turing machine

So these complex operations can be done without function calls

Simplified by function calls

But there is no concept of stack in brain fuck, if you want to implement function calls, it is quite complicated

If there is an extra paper tape, it is easy to implement function calls

With the function call, the coding of the entire main logic will become very simple

So I understand why there are partition designs such as code area, data area, and stack area

c. After execution, you need to know the return value of the return result

After the calculation, store the result in a certain register, for example, it is agreed to store it in the general-purpose register ax

When returning to the calling place, the value in ax is the return value of the called function

Then assign ax to ret

d. Return address

函数调用（跳转）相关指令

CALL
RETURN
NVAR:new statck frame for variable
DARG:delete statck frame for argument

Or take the above method of main calling add as an example

First, main needs to prepare two parameters before calling

Suppose here is the stack area from big to small

Regardless of the previous main stack

But the next 2 places are used to store 2 parameters

Suppose one is 3 and the other is 4

Next call the add method

The situation of the code area

It's time to call

The main code area and the add code area are continuous code areas in the code area and are drawn separately for the convenience of description

The address of the add code area is 430430

The main code area is Call 430430

After the call is finished, a cleanup will be done on this place DARG2

Suppose the address of the call location is 430420

The parameter will also occupy an address 430424

The address that needs to be returned is 430428

After executing the add method, you need to return the pc register to 430428

Execute the next code

After the call, the PC will jump to the position of 430430

pc will be equal to the number in the address 430430

Then move the sp down one bit

The result is saved to the address 430428

Store the return address in

When returning after executing add

Through this place, you can know where to return

Next, look at the vm instruction in the add function

First, you need to make nvar to apply for some initial space of the stack frame for the local variables of the function

This initial space must first store the bp address

Save the old bp address

Because once you jump and enter this new stack, the new position is bp

The old bp is actually the main bp and it has to be saved

Because I want to restore the original appearance of this stack later

First, you can restore the position of the code to the previous state

The second can restore the state of the stack to the state before the call

So recording the old bp is for calling the end of the recovery

Then add some space to the stack

Store local variables The local variable here has only one ret

Then execute ret = a+b

First, save part of the ret address of the assignment

After the assignment is completed, there is a place to store the result

So you need to take this address out

Since it is the first local variable, it is the position of the stack base minus 1

So get its address out via LEA -1

Stored in the so-called ax register

That is, after the execution is completed, the ax register will be the address of ret

Store this address in the stack so there will be a push operation

temporarily stored in the stack

Suppose it is 330330

At this time, ax is free and can be used for calculation

The add command is to add the content in ax to the content of stack peek (stack top) and then save it back to ax

and destroy the top of the stack

take a out first

its position in the upper three squares based on bp

So it's LEA 3

Get this address and load it into ax

Then add an integer LI

Load the ax data into the stack

The top of the stack becomes 3

Then the second parameter is the upper two digits of bp

After getting this address, you can load the data into ax

ax is equal to 4

At this time, the top of the stack is 3 and ax is 4

Call the ADD method to destroy the top of the stack

and add it to 4

ax is equal to 7

At this point the addition is complete and then the assignment statement

find ret address

Because the top of the stack has been destroyed, sp is here

Then store the ax data in the address corresponding to sp, which is the SI (save integer) method

After the save is over, the stack where the address is stored will also be destroyed

Then write 7 in

Then return this variable

is the position of the first local variable bp-1

Then continue to load the position of bp-1 into ax

Because the return value is in ax, you have to lea bp-1

The content of ret returns to ax

then return

look what retuan did

First sp=bp

These spaces are destroyed no matter how many local spaces there are

bp is equal to the address in sp

The specific value in sp is main bp

So bp is back to old bp position

sp+1 so the new bp is also destroyed

Then pc jumps back here

The location of 430428

That is, the position after calling the call

darg2(delete stack frame for args)

This space is made for the parameter, but this space is deleted

Then sp continues to add 430428 to be destroyed

If the parameter of delete is 2, it will be deleted by the two parameters of 3 and 4.

Then the sp returns to this position

like nothing ever happened

Native-Call

Here is the complete copy from c4

IO related instructions

OPEN/CLOS/READ/

为什么没有writeBecause c4 design Native-CALL is mainly to complete the bootstrap c4 source code does not use the write method

PRTF: write data to standard output (fd=1)

Dynamic Memory Related Instructions

MALC
FREE
MSET
MCMP
EXIT is used to terminate the program