【Smart Contract】Ethereum Contract Execution Analysis

Table of contents

Keywords : EVM execution engine, assembly instructions, opcodes, bytecodes

If readers think this article is good, they can go to the first address of the author's article to find out more.

1. Basic concepts

1.1 EVM

EVM is a stack-based, big-endian virtual machine. This virtual machine is not VMware, but a virtual machine similar to JVM, so we can understand EVM as we understand JVM.

Similar to the JVM, the EVM is also a computer designed and created on top of a real computer to support a set of custom instruction sets. It also contains a stack and two storage domains, memory and storage.

Yes, if you customize a set of instruction sets, you generally need to implement a corresponding assembly language, and above the assembly is the high-level language used by developers, such as solidity, vyper, etc.

However, unlike JVM, EVM can be directly installed on various physical machines. EVM is designed to be embedded in the Ethereum client, that is, EVM runs on the Ethereum system. The role of the EVM is to run the Ethereum smart contract.
The contract is created through a transaction in an external account, and the contract bytecode will be attached to the transaction data. Similarly, transactions can also data
carry out various types of interactions with contracts, such as calling and destroying contracts.

1.2 Contract Bytecode

Contract bytecode consists of a series of operators (also called instructions), any operator can be encoded into a byte literal, except PUSHn .
The EVM instruction set supports multiple PUSH instructions, such as PUSH1, PUSH2etc. The following numbers refer to the size of the data bytes that are pushed into the stack, PUSH1 is the data that is pushed into the stack by 1 byte, and so on. Since PUSHn carries data,
the space occupied by this instruction is variable ( PUSH1 0x00occupies 2 bytes, PUSH2 0X00103 bytes, etc.).

1.3 Contract constructor

After the contract is successfully created, its constructor will be removed from it, that is, the constructor will not appear in the deployed contract.

1.4 Interact with the contract

A contract exposes some ABI (Application Binary Interface) to allow the outside world to interact with it.

1.5 Call Data

It is the information attached to the field of the transaction when calling the contract data
. It usually contains a 4-byte method identifier. The construction method of the method identifier is: keccak256("somefunc(uint)uint")[:4], which is the first 4 bytes after the keccak256 hash of the function signature.

1.6 Program Counter

The counter cooperates with the stack, Memory and Storage to complete the execution of the contract bytecode.

The essence of the counter is an offset of the deployed opcode sequence, which can be understood as an execution pointer, and the position pointed by the counter is the position where the next instruction will be executed. The following is a brief sequence of opcodes (if you don’t understand, you can read the text first and then look back):

// 这段操作码序列的总offset为2+2+1=5,其中PUSH1指令占用1byte,指令后的数据占用1字节,JUMPI占用1byte
PUSH1 2 // offset=0,当指针指向offset=0时,表示接下来执行这一行指令。下一个指令的offset=这个指令的offset+这个指令占用的字节数
PUSH1 5 // offset=2
JUMPI    // offset=4,JUMPI实现条件跳转,首先依次取出栈顶2个元素5 2,判断第二个元素(2)是否为0,若不是就跳转到offset=第一个元素(5)的位置,那么就是0x05的位置
         // 若第二个元素是0,则指针自增,继续向下执行
JUMPDEST // offset=5,这个指令是标识此处作为一个跳转着陆点,跳转指令JUMP和JUMPI都必须以此作为着陆点,否则不能跳转,执行报错。

1.7 Execution environment (Context)

When the EVM starts to execute the contract bytecode, it will create a temporary and independent context for the contract. Specifically, it will create several separate memory areas, each with different purposes. They are

  • Code area: static read-only area, storing contract bytecode. It can be read through CODESIZEand CODECOPYinstructions, and the codes of other contracts can be read through EXTCODESIZE
    and EXTCODECOPYinstructions;
  • Stack: It is an array space with a 32-byte element and a capacity (length) of 1024, which is used to store the parameters required by the EVM instruction and the returned result. Instructions can only access stack elements starting from the top of the stack. Usually instructions PUSH1, DUP1, SWAP1, POP
    operate on the stack;
  • Memory: It is an array space of single-byte elements, which is used to store transient data during contract execution. Memory space is accessed by byte offset. Usually the instruction MLOAD, MSTORE, MSTORE8
    will operate on Memory (you can see the M prefix before the instruction);
  • Storage: Different from the previous two structures, it is a map structure for storing persistent data, and both key and value are of type uint256. Usually the command SLOAD, SSTORE
    will operate Storage (you can see the S prefix before the command);
  • calldata: It is the data attached when the transaction occurs, and it is a static read-only area. For example, when the contract is created, the content of calldata is the constructor code. Normally the command CALLDATALOAD, CALLDATASIZE, CALLDATACOPY
    can read it.
  • return data: is the area where the return value of the contract is stored. It can be modified by instructions RETURN, REVERT, RETURNDATASIZE, RETURNDATACOPYand read by instructions;

1.8 OpCode (opcode/EVM instruction/mnemonic)

It is a set of instruction sets tailored for EVM, supports functions such as arithmetic operations, logical operations, bit operations, and conditional jumps, and is a Turing-complete language. Languages ​​such as solidity built on top of it are of course Turing-complete languages.

OpCode can be called opcode/EVM assembly instruction/mnemonic (mnemonic), its role is to help people read the code logic. The final compilation and deployment results of the contract are composed of a series of OpCode and operation data.
EVM executes the contract logic by extracting OpCode one by one from the compiled bytecode sequence for execution. If a certain OpCode fails to execute (such as insufficient parameters/gas), then EVM will revert all changes.

In official documents, the name OpCode is used more often, so readers can use OpCode as a keyword when querying.

Not all OpCodes will consume gas, and some OpCodes will return gas. There are two known operations that will return gas, one is to destroy the contract (return 24000 gas), and the other is to clear the storage (return 15000 gas).
But it should be noted that when the contract is executed, the gas return still exists in a separate refund counter, and will not directly increase the gas balance. If there is insufficient gas later, it will still cause revert. That is to say, the gas that has just been returned cannot be used when the contract is executed, and
the gas can only be returned to the account after the execution is completed. Finally, the amount of gas returned should not exceed half of the consumed gas for transaction execution, that is, at least half of the consumed gas should be paid to the miners.

There are currently more than 140 OpCode, each of which can be encoded into a byte (combined with data to form a bytecode), PUSH
except for instructions, because the instruction can carry data of any length (not obtained from the stack, but when writing code) It is fixed), usually what you see is PUSH1or PUSH2
The number after the command indicates the byte length of the data carried.
Currently, there are PUSH1~ PUSH32PUSH commands. The parameters required by each OpCode are obtained from the stack (stack input), and the calculation results are pushed onto the stack (stack output).

Since each OpCode needs to be encoded into a single-byte size, a maximum of 256 instructions are allowed to be designed (the decimal range of a single byte is 0~255, and the hexadecimal system is 0x00~0xFF, that is, up to 256 different values ​​can be expressed) .

1.9 Gas consumption

As an incentive to provide resources for transaction execution, a certain amount of eth will be paid to miners. This amount is determined by two factors, the amount sent and the amount of work required to complete the transaction.

The gas fee is divided into two types: fixed fee and dynamic fee. The fixed fee is set by the Ethereum platform for certain operations. For example, a simple transfer transaction consumes 21000 gas at a fixed cost; the dynamic fee is calculated according to the following formula :

gas_price * gas_limit = total max gas costs

The values ​​of these two variables are set by the transaction initiator. gas_price is the eth price of 1 gas unit, such as gas_price=10wei, which wei
is the unit of ether currency, so I won’t go into details here. gas_limit is the maximum amount of gas that the user who initiates the transaction is willing to pay for the transaction execution.
These gas will not always be consumed, and the unconsumed gas will be returned to the transaction initiator after the transaction is completed. If the total amount of gas set is not enough to support the completion of the transaction, not only will the transaction fail, but the consumed gas will not be refunded.

The gas fee composition of a transaction carrying contract bytecode: the transaction itself costs 21000gas + the gas fee of OpCode.

Among them, the gas fee of OpCode is divided into two types, one is fixed fee; the other is dynamic fee, which is usually determined by the size or quantity of parameters required by the command, and can be queried in evm.codes for details.

1.10 Contract Execution Process

We need to understand the general process of EVM executing the contract:

  • Each instruction executed in the EVM is called an OpCode (operation code);
  • When the contract is executed, the compiled bytecode will be converted into readable opcodes and manipulated data by the EVM, and then executed;
  • First, the program counter (PC, similar to a register) reads one from the contract bytecode, and then retrieves information such as the operation function and gas cost corresponding to the opcode from the JumpTable (an array with a length of 256);
    then Deduct gas (if the opcode is a dynamic gas cost, it needs to be calculated), if it is enough, execute the opcode, if it is not enough, deduct the gas completely, and roll back the executed instruction. (Depending on the instruction, it may operate on the stack, memory or StateDB)
  • Then, when this opcode is successfully executed, the program counter is incremented, and then enters the next loop (executes the next instruction).

The specific execution process is in a for loop. Here, the function code EVMInterpreter.Run() of geth execution contract is directly pasted as it is. Please read it in combination with the comments:

The code is long, expand to view
 
 
// 这是EVM执行合约的核心方法
func (in *EVMInterpreter) Run(contract *Contract, input []byte, readOnly bool) (ret []byte, err error) {
    
    
  // 省略部分代码
  
  // 初始化执行合约所需的各种变量,其中就有stack、memory、pc等对象
  var (
      op          OpCode // 当前执行的操作码,会在下面的for循环执行时不断变化
      mem = NewMemory()  // bound memory,内部初始化一个包含[]byte的结构体
      stack = newstack() // local stack,内部初始化一个[]uint256数组
      callContext = &ScopeContext{
    
     // 属于当前合约的执行上下文
        Memory:   mem,
        Stack:    stack,
        Contract: contract,
      }
      // For optimisation reason we're using uint64 as the program counter.
      // It's theoretically possible to go above 2^64. The YP defines the PC
      // to be uint256. Practically much less so feasible.
      pc = uint64(0) // program counter 程序计数器
      cost uint64
      // copies used by tracer
      pcCopy  uint64 // needed for the deferred EVMLogger
      gasCopy uint64 // for EVMLogger to log gas remaining before execution
      logged  bool   // deferred EVMLogger should ignore already logged steps
      res     []byte // result of the opcode execution function,当前调用返回值
  )
 

  for {
    
    
      if in.cfg.Debug {
    
    
          // Capture pre-execution values for tracing.
          logged, pcCopy, gasCopy = false, pc, contract.Gas
      }
      // Get the operation from the jump table and validate the stack to ensure there are
      // enough stack items available to perform the operation.
      // 根据程序计数器读取下一个要执行的操作码,实际是个byte类型
      op = contract.GetOp(pc)
      operation := in.cfg.JumpTable[op]  // 从数组中找到对应的操作对象
      cost = operation.constantGas // For tracing (操作对应的固定gas成本)
      // Validate stack(提前检查栈内元素个数是否足够本次操作所需)
      if sLen := stack.len(); sLen < operation.minStack {
    
    
          return nil, &ErrStackUnderflow{
    
    stackLen: sLen, required: operation.minStack}
      } else if sLen > operation.maxStack {
    
    
          return nil, &ErrStackOverflow{
    
    stackLen: sLen, limit: operation.maxStack}
      }
      // 扣减固定gas部分
      if !contract.UseGas(cost) {
    
    
          return nil, ErrOutOfGas
      }
      // 这个操作是否动态gas成本(根据参数长度或个数来决定扣减的gas数额)
      if operation.dynamicGas != nil {
    
    
          // All ops with a dynamic memory usage also has a dynamic gas cost.
          var memorySize uint64
          // calculate the new memory size and expand the memory to fit
          // the operation
          // Memory check needs to be done prior to evaluating the dynamic gas portion,
          // to detect calculation overflows
          if operation.memorySize != nil {
    
    
              memSize, overflow := operation.memorySize(stack)
              if overflow {
    
    
                  return nil, ErrGasUintOverflow
              }
              // memory is expanded in words of 32 bytes. Gas
              // is also calculated in words.
              if memorySize, overflow = math.SafeMul(toWordSize(memSize), 32); overflow {
    
    
                  return nil, ErrGasUintOverflow
              }
          }
          // Consume the gas and return an error if not enough gas is available.
          // cost is explicitly set so that the capture state defer method can get the proper cost
          // 计算动态gas成本
          var dynamicCost uint64
          dynamicCost, err = operation.dynamicGas(in.evm, contract, stack, mem, memorySize)
          cost += dynamicCost // for tracing
          // 扣减动态gas部分
          if err != nil || !contract.UseGas(dynamicCost) {
    
    
              return nil, ErrOutOfGas
          }
          // Do tracing before memory expansion
          if in.cfg.Debug {
    
    
              in.cfg.Tracer.CaptureState(pc, op, gasCopy, cost, callContext, in.returnData, in.evm.depth, err)
              logged = true
          }
          if memorySize > 0 {
    
    
              mem.Resize(memorySize)
          }
      } else if in.cfg.Debug {
    
    
          in.cfg.Tracer.CaptureState(pc, op, gasCopy, cost, callContext, in.returnData, in.evm.depth, err)
          logged = true
      }
      // execute the operation (执行操作,pc=程序计数器,in是EVM引擎包含有账本DB对象,callContext包含栈和memory对象)
      res, err = operation.execute(&pc, in, callContext)
      if err != nil {
    
    
          break
      }
      pc++ // 计数器自增,准备执行下一条指令(注意pc在执行操作时可能已经改变。意思是如果又从字节码中提取了数据,则计数器要继续往右移,移动长度等于数据长度)
  }
}

2. Detailed explanation of the process

The solidity code we write can be compiled into corresponding assembly code through remix or local compilers such as sloc and slocjs, and then converted into pure hexadecimal character code executed by the machine.

  1. The assembly code and bytecode of Remix can be viewed in the [Solidity Compiler --> Compile Details] path;
  2. You can also download the solc compiler locally, and use the compiler to compile the code to obtain assembly code and bytecode.

Download the solc compiler that supports full-featured cpp implementation (recommended), and check the official installation guide
to download solcjs that supports some functions: npm install -g solc

This is illustrated by the following simple piece of code;

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

contract Example {
  address _owner;
  uint abc = 0;
  constructor() {
    _owner = msg.sender;
  }
  function set_val(uint _value) public {
    abc = _value;
  }
}

This is a human-readable solidity code used to implement custom logic. In order to facilitate machine execution, it needs to be compiled into low-level assembly code (also called opcode), and then converted into hexadecimal code by the machine. implement.
The assembly code can be considered as the code form closest to the CPU execution layer. Through the assembly code, we can see the actual performance of the solidity code in the assembly layer more clearly, such as which assembly instructions are used by a function, which is very helpful for us to troubleshoot
. , especially in the debug phase. The following converts solidity to a compact opcode sequence form:

// 请先下载solc编译器到本地
// solc -o learn_bytecode --opcodes 0x00_learn_bytecode.sol  
// 生成文件learn_bytecode/Example.opcode
PUSH1 0x80 PUSH1 0x40 MSTORE ...省略

The opcode sequence consists entirely of EVM instructions and data and arranges all instructions and data in a linear fashion.

Take the previous part of the opcode sequence of the Example contract as an example to explain:PUSH1 0x80 PUSH1 0x40 MSTORE

  • First of all, the opcode is not a bytecode, and the opcode can still be read. The bytecode is completely a string of unreadable hexadecimal characters such as 0128asdasda9s87d98asd, and each opcode can be converted into a byte.
  • PUSH1 0x80 PUSH1 0x40Indicates that 1 byte of 0x80 is pushed onto the stack, followed by 0x40 (keep in mind that a stack element is at most 32bytes or 256bits)
  • MSTOREThe instruction is an operation to save a value to the temporary memory of the EVM. It receives 2 parameters. The first parameter is the memory address used to store the value, and the second parameter is the value to be stored. Note that this instruction is its parameter according to the
    regulations Get it from the stack (rather than external input), so the logic here is MSTORE 0x40 0x80 (store the value 0x80 into address 0x40)
  • For the meaning of other instructions, please check the instruction set table, which will be listed below

The opcode sequence is not conducive to our reading against the code. So we need to generate assembly code line by line:

// solc -o learn_bytecode --asm 0x00_learn_bytecode.sol   生成learn_bytecode/Example.evm
  /* "0x00_learn_bytecode.sol":57:241  contract Example {... */
  mstore(0x40, 0x80)
  /* "0x00_learn_bytecode.sol":111:112  0 */
  0x00
  /* "0x00_learn_bytecode.sol":100:112  uint abc = 0 */
  0x01
  sstore
  /* "0x00_learn_bytecode.sol":118:168  constructor() {... */
  callvalue
  ...省略
  dataOffset(sub_0)
  0x00
  codecopy
  0x00
  return
stop

sub_0 : assembly {
  ...
  auxdata : 0xa264697066735822122035b90a279bfd69292250dbe6e9f45c70ac30c03c0f50b99a887b24d9b292edce64736f6c63430008110033
}

This code is divided into two parts, sub_0 assemblythe boundary, the code above it is the deployment code , and the code in the sub_0 area is the runtime code
. According to the comments in the above code, we can read the assembly code relatively more clearly against the solidity code.
The field at the end auxdata
is a CBOR-encoded binary value used to describe the contract metadata, such as the solidity version. For details, refer to Metadata
.

2.1 About deploying code

As the name suggests, it executes when deployed. It mainly performs three tasks:

  1. Payable check, if the contract constructor does not declare payable, injecting ether during deployment will cause the deployment to fail;
  2. Run the constructor and initialize the declared state variables;
  3. Calculate the bytecode of the runtime code and return it to the EVM and store it in StateDB (key is the contract address);

2.2 runtime code

It is executed when an external call is received after deployment. Although the runtime code is calculated after the deployment code is executed, its calculation method is fixed logic, so it can be solcdirectly generated by the compiler tool.

2.3 Final Bytecode

The final bytecode is datathe data carried in the transaction field for contract deployment. It is a long string of hexadecimal characters and is generated in the following way:

//solc -o learn_bytecode --bin 0x00_learn_bytecode.sol
6080604052600060015534801561001557600080fd5b50336000806101000a81548173ffffffffffffffffffffffffffffffffffffffff021916908373ffffffffffffffffffffffffffffffffffffffff16021790555060e3806100646000396000f3fe6080604052348015600f57600080fd5b506004361060285760003560e01c80634edd148314602d575b600080fd5b60436004803603810190603f91906085565b6045565b005b8060018190555050565b600080fd5b6000819050919050565b6065816054565b8114606f57600080fd5b50565b600081359050607f81605e565b92915050565b6000602082840312156098576097604f565b5b600060a4848285016072565b9150509291505056fea264697066735822122035b90a279bfd69292250dbe6e9f45c70ac30c03c0f50b99a887b24d9b292edce64736f6c63430008110033

The final bytecode is also generated according to a fixed format: deployment code + runtime code + auxdata.

Among them, runtime code + auxdata can be generated as follows:

//solc -o learn_bytecode --bin -runtime 0x00_learn_bytecode.sol
6080604052348015600f57600080fd5b506004361060285760003560e01c80634edd148314602d575b600080fd5b60436004803603810190603f91906085565b6045565b005b8060018190555050565b600080fd5b6000819050919050565b6065816054565b8114606f57600080fd5b50565b600081359050607f81605e565b92915050565b6000602082840312156098576097604f565b5b600060a4848285016072565b9150509291505056fea264697066735822122035b90a279bfd69292250dbe6e9f45c70ac30c03c0f50b99a887b24d9b292edce64736f6c63430008110033

2.4 Instruction set designed for EVM

Ethereum developers specially designed a set of instruction sets for EVM, that is, the aforementioned OpCode (OpCode), which was briefly introduced earlier, and here is an in-depth introduction.

The instruction set code is composed of a series of instructions in the assembly instruction set supported by the code execution engine and the data to be operated. The instruction sets supported by different execution engines are generally different, and the instruction set can be customized by the engine developer. The EVM execution engine runs based on OpCode,
so OpCode is also called the EVM instruction set, and the complete EVM instruction set table can be queried here .

Here is also a simple illustration as follows:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-rA5qqFz4-1677507590202)(./images/ethereum_opcodes_example.jpg)]

For the convenience of reading, the translation in tabular form is as follows:

uint8 Mnemonic StackInput Stack Output Expression Notes
Translation: bytecode command name Stack elements required for instruction execution Elements written to the stack after the instruction is executed expression Remark
00 STOP none none STOP() Stop contract execution
01 ADD top of stack/a/b/bottom of stack /a+b/ a+b Performs an addition operation on the top two int256 or uint256 elements of the stack

Note that the arrangement order of the elements in the StackInput column in the table starts from the top of the stack. This can be illustrated with an example. For example, if the subtraction operation instruction is in the StackInput SUB
in the table /a/b/, then there is the following code:

// EVM指令交互平台:https://www.evm.codes/playground
PUSH 1 (代表b)
PUSH 2 (代表a)
SUB  // 减法运算

The output of this instruction is 1 instead of FFFFFF... (the uint256 form of -1).

The EVM currently has a total of more than 140 instructions. It should be noted that the number of parameters of some instructions is not fixed. For simplicity, we can divide all opcodes into the following categories (sections listed):

  • Stack manipulation opcodes (POP, PUSH, DUP, SWAP)
  • Arithmetic/comparison/bitwise opcodes (ADD, SUB, GT, LT, AND, OR)
  • Environment opcodes (CALLER, CALLVALUE, NUMBER)
  • Memory operation opcodes (MLOAD, MSTORE, MSTORE8, MSIZE)
  • Store operation opcodes (SLOAD, SSTORE)
  • Program counter related opcodes (JUMP, JUMPI, PC, JUMPDEST)
  • Stop opcodes (STOP, RETURN, REVERT, INVALID, SELFDESTRUCT)

Readers can query the instruction set table and the amount of gas consumed by the corresponding instructions in evm.codes .
At the same time, this website also supports online opcode programming, real-time conversion between opcodes, bytecodes, and solidity codes.

The most accurate list of instruction sets supported by Ethereum has to
be queried in the source code of Geth. This link
points to the Go code related to the instruction set of v1.10.26 version of Geth.

2.5 Explain the above assembly instructions in detail

The code is long, expand to view
 
 
    /* "0x00_learn_bytecode.sol":57:241  contract Example {... */
mstore(0x40, 0x80)   // 将0x80这个值存入Memory中0x40的位置,这表示在Memory中开辟0x80这么多的空间以供临时使用,单位字节,转换一下就是 8x16^1+0x16^0=128Byte
/* "0x00_learn_bytecode.sol":111:112  0 */
0x00 // 将0x00入栈(省略PUSH)
/* "0x00_learn_bytecode.sol":100:112  uint abc = 0 */
0x01 // 将0x01入栈(省略PUSH)
sstore // 将0x00这个值存入storage中0x01的位置,即storage[0x01] = 0x00
/* "0x00_learn_bytecode.sol":118:168  constructor() {... */
callvalue // 将本次调用注入的以太币数量入栈,没有就是0 (不管是创建合约,还是调用合约都是一次消息调用,都可注入以太币)
dup1      // 复制栈顶的数值,即为本次调用注入的以太币数量,此时的栈中元素情况:[栈顶, 0, 0] ,这里假设注入0以太币。
iszero    // 取出栈顶的值并判断是否为0,若是则入栈1,否则入栈0,stack: [栈顶,1,0,0]
tag_1     // tag_1 入栈,stack:[栈顶,tag_1,1,0,0]。 注:tag_1只是汇编指令中的语法,并非EVM指令,通常标识一个函数起点,本质上是一个操作码序列的offset。
jumpi     // 取出栈顶2个元素,即1,tag_1,判断第二接近栈顶的元素若非0,则跳转到tag_1位置,否则向下执行。显然这里会跳转tag_1
0x00      // 若不跳转,则到达这一行,入栈0x00,stack:[栈顶,0,tag_1,1,0,0]
dup1      // 复制栈顶元素,stack:[栈顶,0,0,tag_1,1,0,0]
revert    // 回退操作,这将中断执行,并且回滚所有之前的逻辑。
// 段注释:从callvalue到revert这一段表示往合约地址发送以太币,将会导致执行回退(因为没有给构造函数添加payable标识),简称payable检查。

tag_1:    // 这里开始表示constructor()内部的逻辑,即 _owner = msg.sender
pop   // 弹出并丢弃一个栈顶元素
/* "0x00_learn_bytecode.sol":151:161  msg.sender */
caller // 将msg.sender 入栈
/* "0x00_learn_bytecode.sol":142:148  _owner */
0x00   // 0x00 入栈
dup1   // 复制 0x00
/* "0x00_learn_bytecode.sol":142:161  _owner = msg.sender */
0x0100 // 入栈 0x0100
exp    // 指数运算,取出2个栈顶元素,0x0100^0x00 = 1,此时stack:[栈顶,1,0,msg.sender]
dup2   // 复制栈顶下面一个元素,stack:[栈顶,0,1,0,msg.sender]
sload  // 从storage中取出key为栈顶元素的value,并入栈,stack:[栈顶,storage[0x00],1,0,msg.sender]
dup2   // stack:[栈顶,1,storage[0x00],1,0,msg.sender]
0xffffffffffffffffffffffffffffffffffffffff // stack:[栈顶,0xffffffffffffffffffffffffffffffffffffffff,1,storage[0x00],1,0,msg.sender]
mul   // 下面一段指令比较简单,不再注释
not
and
swap1
dup4
0xffffffffffffffffffffffffffffffffffffffff
and
mul
or
swap1
sstore
pop
/* "0x00_learn_bytecode.sol":57:241  contract Example {... */
dataSize(sub_0)  // 这不是一种指令,可以理解为"PUSH dataSize(sub_0)"的缩写,意思是将下面的sub_0代码块的size压入栈顶
dup1      // 复制栈顶的size值
dataOffset(sub_0) // 与dataSize类似,这理解为"PUSH dataOffset(sub_0)"的缩写,意思是将下面的sub_0代码块的offset压入栈顶
0x00      // 0x00入栈,假设size为0x36,offset=0x1c,则stack: [栈顶,0,0x1c,0x36,0x36]
codecopy  // codecopy(destOffset,offset,size),这个指令消费三个栈元素,所以它的作用是从code区offset(0x1c)的位置复制一段字节大小为size(0x36)的数据到Memory区的destOffset(0x00)位置
0x00      // 入栈 0x00,stack:[栈顶, 0, 0x36]
return // return指令含义是本次调用执行结束,并消费2个栈元素,返回Memory区offset为0x00,size为0x36的一段数据
stop // 停止执行
// 段注释:从tag_1到stop都是constructor的逻辑,主要工作是状态变量的初始化(_owner= msg.sender)以及复制并返回sub_0区域的代码。

sub_0 : assembly {
/* "0x00_learn_bytecode.sol":57:241  contract Example {... */
mstore(0x40, 0x80)
callvalue
dup1
iszero
tag_1
jumpi
0x00
dup1
revert
tag_1 :
pop
jumpi(tag_2, lt(calldatasize, 0x04))
shr(0xe0, calldataload(0x00))
dup1
0x4edd1483
eq
tag_3
jumpi
tag_2 :
0x00
dup1
revert
/* "0x00_learn_bytecode.sol":173:239  function set_val(uint _value) public {... */
tag_3 :
tag_4
0x04
dup1
calldatasize
sub
dup2
add
swap1
tag_5
swap2
swap1
tag_6
jump    // in
tag_5 :
tag_7
jump    // in
tag_4 :
stop
tag_7:
/* "0x00_learn_bytecode.sol":226:232  _value */
dup1
/* "0x00_learn_bytecode.sol":220:223  abc */
0x01
/* "0x00_learn_bytecode.sol":220:232  abc = _value */
dup2
swap1
sstore
pop
/* "0x00_learn_bytecode.sol":173:239  function set_val(uint _value) public {... */
pop
jump    // out
/* "#utility.yul":88:205   */
tag_10:
/* "#utility.yul":197:198   */
... 省略一长段 utility.yul 代码
auxdata : 0xa264697066735822122035b90a279bfd69292250dbe6e9f45c70ac30c03c0f50b99a887b24d9b292edce64736f6c63430008110033
}

Alternatively, if you have the ethereum client geth installed , you can use the evm command to learn_bytecode/Example.binconvert bytecode files into readable opcodes by offset:

The code is long, expand to view
 
 
lei@WilldeMacBook-Pro test_solidity % evm disasm learn_bytecode/Example.bin
6080604052600060015534801561001557600080fd5b50336000806101000a81548173ffffffffffffffffffffffffffffffffffffffff021916908373ffffffffffffffffffffffffffffffffffffffff16021790555060e3806100646000396000f3fe6080604052348015600f57600080fd5b506004361060285760003560e01c80634edd148314602d575b600080fd5b60436004803603810190603f91906085565b6045565b005b8060018190555050565b600080fd5b6000819050919050565b6065816054565b8114606f57600080fd5b50565b600081359050607f81605e565b92915050565b6000602082840312156098576097604f565b5b600060a4848285016072565b9150509291505056fea264697066735822122035b90a279bfd69292250dbe6e9f45c70ac30c03c0f50b99a887b24d9b292edce64736f6c63430008110033
00000: PUSH1 0x80
00002: PUSH1 0x40
00004: MSTORE
00005: PUSH1 0x00
00007: PUSH1 0x01
00009: SSTORE
0000a: CALLVALUE
0000b: DUP1
0000c: ISZERO
0000d: PUSH2 0x0015
00010: JUMPI
00011: PUSH1 0x00
00013: DUP1
00014: REVERT
00015: JUMPDEST
00016: POP
00017: CALLER
00018: PUSH1 0x00
0001a: DUP1
0001b: PUSH2 0x0100
0001e: EXP
0001f: DUP2
00020: SLOAD
00021: DUP2
00022: PUSH20 0xffffffffffffffffffffffffffffffffffffffff
00037: MUL
00038: NOT
00039: AND
0003a: SWAP1
0003b: DUP4
0003c: PUSH20 0xffffffffffffffffffffffffffffffffffffffff
00051: AND
00052: MUL
00053: OR
00054: SWAP1
00055: SSTORE
00056: POP
00057: PUSH1 0xe3
00059: DUP1
0005a: PUSH2 0x0064
0005d: PUSH1 0x00
0005f: CODECOPY
00060: PUSH1 0x00
00062: RETURN
00063: INVALID
00064: PUSH1 0x80
00066: PUSH1 0x40
00068: MSTORE
00069: CALLVALUE
0006a: DUP1
0006b: ISZERO
0006c: PUSH1 0x0f
0006e: JUMPI
0006f: PUSH1 0x00
00071: DUP1
00072: REVERT
00073: JUMPDEST
00074: POP
00075: PUSH1 0x04
00077: CALLDATASIZE
00078: LT
00079: PUSH1 0x28
0007b: JUMPI
0007c: PUSH1 0x00
0007e: CALLDATALOAD
0007f: PUSH1 0xe0
00081: SHR
00082: DUP1
00083: PUSH4 0x4edd1483
00088: EQ
00089: PUSH1 0x2d
0008b: JUMPI
0008c: JUMPDEST
0008d: PUSH1 0x00
0008f: DUP1
00090: REVERT
00091: JUMPDEST
00092: PUSH1 0x43
00094: PUSH1 0x04
00096: DUP1
00097: CALLDATASIZE
00098: SUB
00099: DUP2
0009a: ADD
0009b: SWAP1
0009c: PUSH1 0x3f
0009e: SWAP2
0009f: SWAP1
000a0: PUSH1 0x85
000a2: JUMP
000a3: JUMPDEST
000a4: PUSH1 0x45
000a6: JUMP
000a7: JUMPDEST
000a8: STOP
000a9: JUMPDEST
000aa: DUP1
000ab: PUSH1 0x01
000ad: DUP2
000ae: SWAP1
000af: SSTORE
000b0: POP
000b1: POP
000b2: JUMP
000b3: JUMPDEST
000b4: PUSH1 0x00
000b6: DUP1
000b7: REVERT
000b8: JUMPDEST
000b9: PUSH1 0x00
000bb: DUP2
000bc: SWAP1
000bd: POP
000be: SWAP2
000bf: SWAP1
000c0: POP
000c1: JUMP
000c2: JUMPDEST
000c3: PUSH1 0x65
000c5: DUP2
000c6: PUSH1 0x54
000c8: JUMP
000c9: JUMPDEST
000ca: DUP2
000cb: EQ
000cc: PUSH1 0x6f
000ce: JUMPI
000cf: PUSH1 0x00
000d1: DUP1
000d2: REVERT
000d3: JUMPDEST
000d4: POP
000d5: JUMP
000d6: JUMPDEST
000d7: PUSH1 0x00
000d9: DUP2
000da: CALLDATALOAD
000db: SWAP1
000dc: POP
000dd: PUSH1 0x7f
000df: DUP2
000e0: PUSH1 0x5e
000e2: JUMP
000e3: JUMPDEST
000e4: SWAP3
000e5: SWAP2
000e6: POP
000e7: POP
000e8: JUMP
000e9: JUMPDEST
000ea: PUSH1 0x00
000ec: PUSH1 0x20
000ee: DUP3
000ef: DUP5
000f0: SUB
000f1: SLT
000f2: ISZERO
000f3: PUSH1 0x98
000f5: JUMPI
000f6: PUSH1 0x97
000f8: PUSH1 0x4f
000fa: JUMP
000fb: JUMPDEST
000fc: JUMPDEST
000fd: PUSH1 0x00
000ff: PUSH1 0xa4
00101: DUP5
00102: DUP3
00103: DUP6
00104: ADD
00105: PUSH1 0x72
00107: JUMP
00108: JUMPDEST
00109: SWAP2
0010a: POP
0010b: POP
0010c: SWAP3
0010d: SWAP2
0010e: POP
0010f: POP
00110: JUMP
00111: INVALID
00112: LOG2
00113: PUSH5 0x6970667358
00119: opcode 0x22 not defined
0011a: SLT
0011b: KECCAK256
0011c: CALLDATALOAD
0011d: opcode 0xb9 not defined
0011e: EXP
0011f: opcode 0x27 not defined
00120: SWAP12
00121: REVERT
00122: PUSH10 0x292250dbe6e9f45c70ac
0012d: ADDRESS
0012e: opcode 0xc0 not defined
0012f: EXTCODECOPY
00130: opcode 0xf not defined
00131: POP
00132: opcode 0xb9 not defined
00133: SWAP11
00134: DUP9
incomplete push instruction at 309

In the above opcode sequence, the left side is the offset in the code area of ​​the corresponding right instruction in hexadecimal, which is actually the so-called code area. Like CODECOPY, JUMP
the offset and dest parameters in the instruction all refer to the offset of the corresponding instruction in the code area.
The EVM executes this sequence of opcodes sequentially from top to bottom. If it encounters an instruction JUMP, RETURN
it jumps or interrupts execution. For example, when the above opcode sequence is executed for the first time, it executes at most until the 00062RETURN instruction corresponding to this line is completed.
This is Because this opcode sequence is strictly a deployment code, it is only executed during deployment. After deployment, this code will return an 00062opcode sequence after the return to be saved by the EVM and executed when the contract is called externally.


reference

Guess you like

Origin blog.csdn.net/sc_lilei/article/details/129251495