Notes|CPython|Python program operation principle

Python is an interpreted language (although the definition is blurred by the existence of bytecode compilers), that is, it does not need to be compiled into machine language before running, but is compiled into machine language at runtime. This means that source files can be run without having to explicitly create an executable to run.

Data 1: Python Documentation > glossary > interpreted

In a nutshell, the execution of Python scripts can be simplified and summarized into the following two steps:

  1. Python Compiler: compiles Python code to bytecode
  2. Python virtual machine: execute bytecode line by line

How Python code works

Next, let's take a function script that calculates the golden section as an example, how the Python script is compiled into bytecode, and how the bytecode is run:

GOLD = 0.618

def get_golden_ratio(x):
    """计算黄金分割值"""
    return GOLD * x

print(get_golden_ratio(3))

We first store the above Python code into a string script, and then use the built-in function compileto compile it to get a code object code.

>>> script = ("GOLD = 0.618\n"
...           "def get_golden_ratio(x):\n"
...           "    return GOLD * x\n"
...           "print(get_golden_ratio(3))")
>>> code = compile(script, "test.py", "exec")
>>> code
<code object <module> at 0x000002BCD0AE2290, file "test.py", line 1>

codeThe commonly used attributes and meanings of objects are as follows:

attribute name attribute meaning
co_filename Create the file name of codethe object
co_firstlineno The line number of the first line in the Python source code
co_name codeobject name
co_code raw bytecode as a string
co_consts A tuple of constants used in the bytecode
co_varnames A tuple of parameter names and identifiers for local variables
co_names tuple of identifiers other than parameters and function local variables
co_cellvars tuple of identifiers for unit variables (referenced via the containing scope)
co_freevars tuple of identifiers for free variables (referenced by function closures)
co_stacksize Requires virtual machine stack space

Information 2: Python Documentation > inspect

For example, we can view the bytecode of this code (based on Python 3.10) through co_codethe attribute :

>>> code.co_code
b'd\x00Z\x00d\x01d\x02\x84\x00Z\x01e\x02e\x01d\x03\x83\x01\x83\x01\x01\x00d\x04S\x00'
>>> [ch for ch in code.co_code]
[100, 0, 90, 0, 100, 1, 100, 2, 132, 0, 90, 1, 101, 2, 101, 1, 100, 3, 131, 1, 131, 1, 1, 0, 100, 4, 83, 0]

In Python 3.6 and above, each bytecode instruction contains 2 bytes (that is, every 2 integers between 0 and 255 in the above list constitute a bytecode instruction), and the first byte is the bytecode instruction , the second byte is the parameter of the bytecode instruction, if the bytecode instruction has no parameters, it will use 0a placeholder . Note that bytecode is an implementation detail of the CPython interpreter, and there is no guarantee that bytecode will not be added, removed, or changed between Python versions.

Data 3: Python Documentation > glossary > bytecode

Source 4: Python Documentation > dis

You can also view all the identifiers used by this code through co_namesthe attribute , or co_constsview all the constants used by this code through the attribute:

>>> code.co_names
('GOLD', 'get_golden_ratio', 'print')
>>> code.co_consts
(0.618, <code object get_golden_ratio at 0x000002BCD0AE3E10, file "test.py", line 2>, 'get_golden_ratio', 3, None)

As can be seen through all the constants used in this code, the function get_gold_ratiois compiled into another codeobject, which is referenced here as a constant. Therefore, we can further view the bytecode corresponding to get_gold_ratiothe function (based on Python 3.10):

>>> code.co_consts[1].co_code
b't\x00|\x00\x14\x00S\x00'
>>> [ch for ch in code.co_consts[1].co_code]
[116, 0, 124, 0, 20, 0, 83, 0]

The Python virtual machine is a fully software-defined computer that executes the bytecode generated by the bytecode compiler.

Information 5: Python Documentation > glossary > virtual machine

In addition to analyzing the execution process in the Python virtual machine through the original bytecode in the form of a string through co_codethe attribute can also combine the standard library disto decompile the above Python code and analyze the execution process of the bytecode in the Python virtual machine (based on Python 3.10):

>>> import dis
>>> dis.dis(script)
  1           0 LOAD_CONST               0 (0.618)
              2 STORE_NAME               0 (GOLD)
  2           4 LOAD_CONST               1 (<code object get_golden_ratio at 0x000002BCBEECFAA0, file "<dis>", line 2>)
              6 LOAD_CONST               2 ('get_golden_ratio')
              8 MAKE_FUNCTION            0
             10 STORE_NAME               1 (get_golden_ratio)
  4          12 LOAD_NAME                2 (print)
             14 LOAD_NAME                1 (get_golden_ratio)
             16 LOAD_CONST               3 (3)
             18 CALL_FUNCTION            1
             20 CALL_FUNCTION            1
             22 POP_TOP
             24 LOAD_CONST               4 (None)
             26 RETURN_VALUE
Disassembly of <code object get_golden_ratio at 0x000002BCBEECFAA0, file "<dis>", line 2>:
  3           0 LOAD_GLOBAL              0 (GOLD)
              2 LOAD_FAST                0 (x)
              4 BINARY_MULTIPLY
              6 RETURN_VALUE

return GOLD * xFrom bottom to top in order of reference, first explain the meaning of the bytecode corresponding to line 3 ( ):

  • LOAD_GLOBAL(116), 0: codeRead the reference of co_namesthe 0th identifier ( GOLD) in the object of the previous layer, and push it to the top of the stack; at this time, there is 1 element in the stack;
  • LOAD_FAST(124), 0: codeRead the reference of the co_varnames0th identifier ( ) in the current object, and push it to the top of the stack; at this time, there are 2 elements in the stack;x
  • BINARY_MULTIPLY(20), 0: Continuously pop two stack top elements GOLD( xthe references of and respectively), execute *the operator , and push the result to the top of the stack; at this time, there is 1 element in the stack;
  • RETURN_VALUE(83), 0: Pop the top element of the stack and return it to the caller; there is no element in the stack at this time, and the third line ends.

Line 1 ( GOLD = 0.618) corresponds to the meaning of the bytecode:

  • LOAD_CONST(100), 0: codeRead the reference of co_conststhe 0th value ( 0.618) of the current object, and push it to the top of the stack; at this time, there is 1 element in the stack;
  • STORE_NAME(90), 0: Pop the top element ( 0.618reference) of the stack and assign it codeto co_namesthe 0th identifier ( GOLD) in the current object; at this time, there is no element in the stack, and line 1 ends.

Line 2 ( def get_golden_ratio(x):) corresponds to the meaning of the bytecode:

  • LOAD_CONST(100), 1: codeRead the reference of co_conststhe first value ( the objectget_golden_ratio ) of the current object; there is 1 element in the stack at this time;code
  • LOAD_CONST(100), 2: codeRead the reference of co_conststhe second value (string "get_golden_ratio") in the current object; there are 2 elements in the stack at this time;
  • MAKE_FUNCTION(132), 0: Pop the top element of the stack (the reference "get_golden_ratio"of ) as the name of the function; then pop the top element of the stack (the reference of the objectget_golden_ratio ) as the code associated with the function; construct a new function object and push it to the top of the stack; at this time there are code1 element;
  • STORE_NAME(90), 1: Pop the top element of the stack (the newly constructed get_golden_ratiofunction code object), and assign it to the first identifier ( ) incode the current object ; at this time, there is no element in the stack, and the second line ends.co_namesget_golden_ratio

Line 4 ( print(get_golden_ratio(3))) corresponds to the meaning of the bytecode:

  • LOAD_NAME(101), 2: codeRead the reference of co_namesthe second identifier (the code object printof current object, and push it to the top of the stack; at this time, there is 1 element in the stack;
  • LOAD_NAME(101), 1: codeRead the reference of co_namesthe first identifier ( get_golden_ratiofunction code object) in the current object, and push it to the top of the stack; at this time, there are 2 elements in the stack;
  • LOAD_CONST(100), 3: codeRead the reference of co_conststhe third value (integer 3) in the current object, and push it to the top of the stack; at this time, there are 3 elements in the stack;
  • CALL_FUNCTION(131), 1: Pop a stack top element (reference to an 3integer ) as a parameter of the function; then pop the top stack element ( get_golden_ratiofunction code object) as the called callable object; then call the callable function with parameters, the callable The return value returned by the object ( get_golden_ratiothe return value of the function) is pushed to the top of the stack; at this time, there are 2 elements in the stack;
  • CALL_FUNCTION(131), 1: Pop a stack top element ( get_golden_ratiothe reference to the return value of the function) as a parameter of the function, then pop the top stack element (the code object printof ) as the called callable object; then call the callable with parameters Function, push the return value returned by the callable object (the return value printof None) to the top of the stack; at this time, there is 1 element in the stack;
  • POP_TOP(1), 0: Pop the top element of the stack and delete it; there is no element in the stack at this time;
  • LOAD_CONST(100), 4: codeRead the reference of co_conststhe fourth value ( None) in the current object, and push it to the top of the stack; at this time, there is 1 element in the stack;
  • RETURN_VALUE(83), 0: Pop the top element of the stack and return it to the caller; there is no element in the stack at this time, and the fourth line ends.

Guess you like

Origin blog.csdn.net/Changxing_J/article/details/129779161