One, virtual machine
The virtual machine is used
- Execute transactions on Ethereum ,
- Change the state of Ethereum .
There are two types of transactions:
- Ordinary transaction
- Smart contract transactions.
You need to pay for gas when executing transactions.
There are four ways to call between smart contracts.
Second, the Ethereum virtual machine
The Ethereum Virtual Machine, EVM for short , is used to execute transactions on Ethereum.
The business process is as follows:
Enter a transaction, it will be converted into a Message object internally and passed to EVM for execution.
If the sum is an ordinary transfer transactions , the direct access to modify StateDB
the corresponding account balance can be.
If it is the creation or invocation of a smart contract , the bytecode is loaded and executed through the interpreter in the EVM, and StateDB may be queried or modified during execution.
3. Intrinsic Gas
For every transaction, regardless of the amount of fixed gas fee, the calculation method is as follows:
If your transaction does not carry additional data (Payload) , such as ordinary transfers, you need to charge 21,000 gas.
If your transaction carries additional data, then this part of the data also needs to be charged, specifically, it is charged by byte:
- 4 blocks with a byte of 0,
- If the byte is not 0, receive 68 blocks ,
So you will see a lot of contract optimization, the purpose is to reduce the number of non-zero bytes in the data, thereby reducing gas consumption.
Four, generate Contract object
The transaction will be converted into a Message object and sent to the EVM, and the EVM will generate a Contract object based on the Message for subsequent execution:
Can be seen, Contract will, under the contract from the address StateDB
load the corresponding code , may be fed back to the interpreter to execute.
In addition, there is an upper limit on the fuel cost that can be consumed to execute the contract , which is what each block of the node configuration can accommodate GasLimit
.
Five, sent to the interpreter for execution
Once the code and input are available, it can be sent to the interpreter for execution. EVM is a stack-based virtual machine . Four components need to be operated in the interpreter:
- PC: similar to the PC register in the CPU , pointing to the currently executing instruction
- Stack: Execution stack , bit width is 256 bits, maximum depth is 1024
- Memory: memory space
- Gas: Gas pool , if the postage is used up, the transaction execution will fail
The detailed explanation of the execution process is shown in the figure below:
Each instruction of the EVM is called an OpCode and occupies one byte, so the instruction set does not exceed 256 at most . Please refer to https://ethervm.io for specific description .
For example, the following figure is an example (PUSH1=0x60, MSTORE=0x52):
- First, the PC will read an OpCode from the contract code,
- Then retrieve the corresponding operation from a JumpTable , which is the set of functions associated with it.
- Next, the gas cost for this operation will be calculated. If the gas runs out, the execution will fail and an ErrOutOfGas error will be returned.
- If the fuel cost is sufficient, call execute () to execute the instruction. Depending on the type of instruction, read and write operations on Stack, Memory or StateDB will be performed respectively.
Six, call the contract function
After analyzing the main flow of the EVM explanation and execution, some students may ask: Then how does the EVM know which function in the contract the transaction wants to call? Don't worry, as mentioned earlier, there is also an Input sent to the interpreter along with the contract code, and this Input data is provided by the transaction.
Input data is usually divided into two parts:
-
The first 4 bytes are called " 4-byte signature ", which are the first 4 bytes of the Keccak hash value of a certain function signature, as the unique identifier of the function . (You can check all current function signatures on this website )
-
What follows is the parameters that need to be provided to call the function , and the length is variable .
For example: after I deployed the A contract, the input data corresponding to the call add(1) is
0x87db03b70000000000000000000000000000000000000000000000000000000000000001
When we compile the smart contract, the compiler will automatically add a piece of function selection logic to the front of the generated bytecode :
- First
CALLDATALOAD
push the "4-byte signature" onto the stack through instructions, - Then it is compared with the functions contained in the contract in turn , and if it matches, the JUMPI instruction is called to jump into the code to continue execution.
This may be a bit abstract, we can look at the disassembly code corresponding to the contract in the figure above to see at a glance:
Mentioned here CALLDATALOAD
, by the way, let's talk about the instructions related to data loading. There are 4 types:
- CALLDATALOAD: Load the input data into the Stack
- CALLDATACOPY: Load input data into Memory
- CODECOPY: Copy the current contract code to Memory
- EXTCODECOPY: Copy the external contract code to Memory
The last one, EXTCODECOPY, is not very commonly used. It is generally used to audit whether the bytecode of a third-party contract meets the specifications, and generally consumes more gas.
The operations corresponding to these instructions are shown in the figure below:
Seven, the contract calls the contract
There are 4 ways to call another contract inside the contract:
- CALL
- CALLCODE
- DELEGATECALL
- STATICALL
Later, I will write an article to compare their similarities and differences. Let's take the simplest CALL as an example. The calling process is shown in the following figure:
It can be seen that the caller stores the call parameters in the memory and then executes the CALL instruction.
When the CALL instruction is executed, a new Contract object is created , and the call parameters in the memory are used as its Input.
The interpreter will create a new Stack
sum for the execution of the new contract Memory
, so as not to disrupt the execution environment of the original contract.
After the execution of the new contract is completed, the execution result is written to the previously specified memory address through the RETURN instruction, and then the original contract continues to execute backward.
8. Create a contract
All the contract calls discussed above, what about the process of creating a contract?
If the to address of a transaction is nil , it indicates that the transaction is used to create a smart contract.
First you need to create a contract address , using the following calculation: Keccak(RLP(call_addr, nonce))[:12]
.
In other words, perform RLP encoding on the address and nonce of the transaction initiator, calculate the Keccak hash value, and take the last 20 bytes as the address of the contract.
The next step is to create a corresponding based on the contract address stateObject
, and then store the contract code contained in the transaction .
All state changes ofstorage trie
the contract will be stored in one , and finally Key-Value
stored in StateDB in the form of.
Once the code is stored, it cannot be changed, and storage trie
the content in can be modified by calling the contract, such as through the SSTORE command.
Nine, oil fee calculation
Finally, long-winded about the calculation of fuel costs, the formula is basically based on Ethernet Square Yellow Book definition.
Of course you can read the fucking code directly, the code is located in core/vm/gas.go and core/vm/gas_table.go.
X. Four calling methods of contract
In medium and large projects, it is impossible for us to implement all functions in one smart contract, and this is not conducive to division of labor and cooperation.
Under normal circumstances, we divide the code into different libraries or contracts by function, and then provide interfaces to call each other.
In Solidity
, if only for code reuse, we will common code it out, deployed to a library in the back as you can call C library, Java library as used.
But it is not allowed to define any storage type variables in the library, which means that the library cannot modify the state of the contract.
If we need to modify the contract state, we need to deploy a new contract, which involves the contract calling the contract.
There are four ways to call a contract:
- CALL
- CALLCODE
- DELEGATECALL
- STATICCALL
1. CALL vs. CALLCODE
The difference between CALL and CALLCODE is that the context of code execution is different.
Specifically, CALL modified is the callee storage, and CALLCODE modified is the caller storage of.
Let's write a contract to verify our understanding:
pragma solidity ^0.4.25;
contract A {
int public x;
function inc_call(address _contractAddress) public {
_contractAddress.call(bytes4(keccak256("inc()")));
}
function inc_callcode(address _contractAddress) public {
_contractAddress.callcode(bytes4(keccak256("inc()")));
}
}
contract B {
int public x;
function inc() public {
x++;
}
}
Let's call it first inc_call()
, and then check the changes in the value of x in contracts A and B:
It can be found that x in contract B has been modified, and x in contract A is still equal to 0.
Let's call it again to inc_callcode()
try:
It can be found that this modification is x in contract A, and x in contract B remains unchanged.
2. CALLCODE vs. DELEGATECALL
In fact, it can be considered that DELEGATECALL is a bugfix version of CALLCODE, and CALLCODE is no longer recommended.
The difference between CALLCODE and DELEGATECALL is: msg.sender
different.
Specifically, DELEGATECALL will always use the address of the original caller, while CALLCODE will not.
We still write a piece of code to verify our understanding:
pragma solidity ^0.4.25;
contract A {
int public x;
function inc_callcode(address _contractAddress) public {
_contractAddress.callcode(bytes4(keccak256("inc()")));
}
function inc_delegatecall(address _contractAddress) public {
_contractAddress.delegatecall(bytes4(keccak256("inc()")));
}
}
contract B {
int public x;
event senderAddr(address);
function inc() public {
x++;
emit senderAddr(msg.sender);
}
}
We first call inc_callcode() and observe the log output:
It can be found that msg.sender points to the address of contract A, not the address of the transaction initiator.
Let's call inc_delegatecall() again and observe the log output:
It can be found that msg.sender points to the initiator of the transaction.
3. STATICCALL
Putting STATICCALL here seems to be a stigma, because there is currently no low level API in Solidity that can directly call it. It is only planned to compile the functions that call view and pure types into STATICCALL instructions at the compiler level in the future.
The function of the view type indicates that it cannot modify the state variable, while the function of the pure type is more strict, and it is not allowed to even read the state variable .
This is currently checked during the compilation phase, and if the regulations are not met, a compilation error will occur. If you change to the STATICCALL instruction in the future, you can ensure this completely at runtime, and you may see a transaction that fails to execute.
Without further ado, let's take a look at the implementation code of STATICCALL:
As you can see, the interpreter adds a readOnly attribute. STATICCALL will set this attribute to true. If a write operation to the state variable occurs, an errWriteProtection error will be returned.
The content comes from https://learnblockchain.cn/2019/04/09/easy-evm