Blockchain Ethereum Virtual Machine EVM Detailed Explanation

One, virtual machine

The virtual machine is used

  • Execute transactions on Ethereum ,
  • Change the state of Ethereum .

There are two types of transactions:

  • Ordinary transaction
  • Smart contract transactions.

You need to pay for gas when executing transactions.

There are four ways to call between smart contracts.

 

Second, the Ethereum virtual machine

The Ethereum Virtual Machine, EVM for short , is used to execute transactions on Ethereum.

The business process is as follows:
Business Process

Enter a transaction, it will be converted into a Message object internally and passed to EVM for execution.

If the sum is an ordinary transfer transactions , the direct access to modify StateDB  the corresponding account balance can be.

If it is the creation or invocation of a smart contract , the bytecode is loaded and executed through the interpreter in the EVM, and StateDB may be queried or modified during execution.

 

3. Intrinsic Gas

For every transaction, regardless of the amount of fixed gas fee, the calculation method is as follows:
Transaction oil fee calculation

If your transaction does not carry additional data (Payload) , such as ordinary transfers, you need to charge 21,000 gas.

If your transaction carries additional data, then this part of the data also needs to be charged, specifically, it is charged by byte:

  • 4 blocks with a byte of 0,
  • If the byte is not 0, receive 68 blocks ,

So you will see a lot of contract optimization, the purpose is to reduce the number of non-zero bytes in the data, thereby reducing  gas  consumption.

 

Four, generate Contract object

The transaction will be converted into a Message object and sent to the EVM, and the EVM will generate a Contract object based on the Message for subsequent execution:

Transaction generation object

Can be seen, Contract will, under the contract from the address  StateDB load the corresponding code , may be fed back to the interpreter to execute.

In addition, there is an upper limit on the fuel cost that can be consumed to execute the contract , which is what each block of the node configuration can accommodate  GasLimit.

 

Five, sent to the interpreter for execution

Once the code and input are available, it can be sent to the interpreter for execution. EVM is a stack-based virtual machine . Four components need to be operated in the interpreter:

  • PC: similar to the PC register in the CPU , pointing to the currently executing instruction
  • Stack: Execution stack , bit width is 256 bits, maximum depth is 1024
  • Memory: memory space
  • Gas: Gas pool , if the postage is used up, the transaction execution will fail

 

Four components of the interpreter

The detailed explanation of the execution process is shown in the figure below:

Interpreter execution flow

Each instruction of the EVM is called an OpCode and occupies one byte, so the instruction set does not exceed 256 at most . Please refer to https://ethervm.io for specific description  .

For example, the following figure is an example (PUSH1=0x60, MSTORE=0x52):

OpCode command

  • First, the PC will read an OpCode from the contract code,
  • Then retrieve the corresponding operation from a JumpTable , which is the set of functions associated with it.
  • Next, the gas cost for this operation will be calculated. If the gas runs out, the execution will fail and an ErrOutOfGas error will be returned.
  • If the fuel cost is sufficient, call execute () to execute the instruction. Depending on the type of instruction, read and write operations on Stack, Memory or StateDB will be performed respectively.

 

Six, call the contract function

After analyzing the main flow of the EVM explanation and execution, some students may ask: Then how does the EVM know which function in the contract the transaction wants to call? Don't worry, as mentioned earlier, there is also an Input sent to the interpreter along with the contract code, and this Input data is provided by the transaction.

Input data

Input data is usually divided into two parts:

  • The first 4 bytes are called " 4-byte signature ", which are the first 4 bytes of the Keccak hash value of a certain function signature, as the unique identifier of the function . (You can check all current function signatures on this website )

  • What follows is the parameters that need to be provided to call the function , and the length is variable .

For example: after I deployed the A contract, the input data corresponding to the call add(1) is

0x87db03b70000000000000000000000000000000000000000000000000000000000000001

When we compile the smart contract, the compiler will automatically add a piece of function selection logic to the front of the generated bytecode :

  • First CALLDATALOAD push the "4-byte signature" onto the stack through  instructions,
  • Then it is compared with the functions contained in the contract in turn , and if it matches, the JUMPI instruction is called to jump into the code to continue execution.

This may be a bit abstract, we can look at the disassembly code corresponding to the contract in the figure above to see at a glance:

Function signature

Disassembly code

Mentioned here  CALLDATALOAD, by the way, let's talk about the instructions related to data loading. There are 4 types:

  • CALLDATALOAD: Load the input data into the Stack
  • CALLDATACOPY: Load input data into Memory
  • CODECOPY: Copy the current contract code to Memory
  • EXTCODECOPY: Copy the external contract code to Memory

The last one, EXTCODECOPY, is not very commonly used. It is generally used to audit whether the bytecode of a third-party contract meets the specifications, and generally consumes more gas.

The operations corresponding to these instructions are shown in the figure below:

Operation corresponding to the instruction

 

 

Seven, the contract calls the contract

There are 4 ways to call another contract inside the contract:

  • CALL
  • CALLCODE
  • DELEGATECALL
  • STATICALL

Later, I will write an article to compare their similarities and differences. Let's take the simplest CALL as an example. The calling process is shown in the following figure:

CALL call process

It can be seen that the caller stores the call parameters in the memory and then executes the CALL instruction.

When the CALL instruction is executed, a new Contract object is created , and the call parameters in the memory are used as its Input.

The interpreter will create a new Stack sum  for the execution of the new contract Memory , so as not to disrupt the execution environment of the original contract.

After the execution of the new contract is completed, the execution result is written to the previously specified memory address through the RETURN instruction, and then the original contract continues to execute backward.

 

8. Create a contract

All the contract calls discussed above, what about the process of creating a contract?

If the to address of a transaction is nil , it indicates that the transaction is used to create a smart contract.

First you need to create a contract address , using the following calculation: Keccak(RLP(call_addr, nonce))[:12].

In other words, perform RLP encoding on the address and nonce of the transaction initiator, calculate the Keccak hash value, and take the last 20 bytes as the address of the contract.

The next step is to create a corresponding  based on the contract address stateObject, and then store the contract code contained in the transaction .

All state changes ofstorage trie  the contract will be stored in one  , and finally  Key-Value stored in StateDB in the form of.

Once the code is stored, it cannot be changed, and  storage trie the content in can be modified by calling the contract, such as through the SSTORE command.

Generate contract address

 

Nine, oil fee calculation

Finally, long-winded about the calculation of fuel costs, the formula is basically based on Ethernet Square Yellow Book definition.
Ethereum Yellow Paper gas

Of course you can read the fucking code directly, the code is located in core/vm/gas.go and core/vm/gas_table.go.

 

X. Four calling methods of contract

In medium and large projects, it is impossible for us to implement all functions in one smart contract, and this is not conducive to division of labor and cooperation.

Under normal circumstances, we divide the code into different libraries or contracts by function, and then provide interfaces to call each other.

In  Solidity , if only for code reuse, we will common code it out, deployed to a library in the back as you can call C library, Java library as used.

But it is not allowed to define any storage type variables in the library, which means that the library cannot modify the state of the contract.

If we need to modify the contract state, we need to deploy a new contract, which involves the contract calling the contract.

There are four ways to call a contract:

  • CALL
  • CALLCODE
  • DELEGATECALL
  • STATICCALL

1. CALL vs. CALLCODE

The difference between CALL and CALLCODE is that the context of code execution is different.

Specifically, CALL modified is the callee storage, and CALLCODE modified is the caller storage of.

storage

Let's write a contract to verify our understanding:

pragma solidity ^0.4.25;

contract A {
  int public x;

  function inc_call(address _contractAddress) public {
      _contractAddress.call(bytes4(keccak256("inc()")));
  }
  function inc_callcode(address _contractAddress) public {
      _contractAddress.callcode(bytes4(keccak256("inc()")));
  }
}

contract B {
  int public x;

  function inc() public {
      x++;
  }
}

Let's call it first  inc_call(), and then check the changes in the value of x in contracts A and B:

the value of x

It can be found that x in contract B has been modified, and x in contract A is still equal to 0.

Let's call it again to  inc_callcode() try:

x is also equal to 0

It can be found that this modification is x in contract A, and x in contract B remains unchanged.

 

2. CALLCODE vs. DELEGATECALL

In fact, it can be considered that DELEGATECALL is a bugfix version of CALLCODE, and CALLCODE is no longer recommended.

The difference between CALLCODE and DELEGATECALL is: msg.sender different.

Specifically, DELEGATECALL will always use the address of the original caller, while CALLCODE will not.

The difference between CALLCODE and DELEGATECALL

We still write a piece of code to verify our understanding:

pragma solidity ^0.4.25;

contract A {
  int public x;

  function inc_callcode(address _contractAddress) public {
      _contractAddress.callcode(bytes4(keccak256("inc()")));
  }
  function inc_delegatecall(address _contractAddress) public {
      _contractAddress.delegatecall(bytes4(keccak256("inc()")));
  }
}

contract B {
  int public x;

  event senderAddr(address);
  function inc() public {
      x++;
      emit senderAddr(msg.sender);
  }
}

We first call inc_callcode() and observe the log output:

log output

It can be found that msg.sender points to the address of contract A, not the address of the transaction initiator.

Let's call inc_delegatecall() again and observe the log output:

log output

It can be found that msg.sender points to the initiator of the transaction.

 

3. STATICCALL

Putting STATICCALL here seems to be a stigma, because there is currently no low level API in Solidity that can directly call it. It is only planned to compile the functions that call view and pure types into STATICCALL instructions at the compiler level in the future.

The function of the view type indicates that it cannot modify the state variable, while the function of the pure type is more strict, and it is not allowed to even read the state variable .

This is currently checked during the compilation phase, and if the regulations are not met, a compilation error will occur. If you change to the STATICCALL instruction in the future, you can ensure this completely at runtime, and you may see a transaction that fails to execute.

Without further ado, let's take a look at the implementation code of STATICCALL:

Implementation code

As you can see, the interpreter adds a readOnly attribute. STATICCALL will set this attribute to true. If a write operation to the state variable occurs, an errWriteProtection error will be returned.

 

 

 

 

 

 

The content comes from https://learnblockchain.cn/2019/04/09/easy-evm

Guess you like

Origin blog.csdn.net/u013288190/article/details/112978914