40 lines of Python code, write a CPU!

Table of contents

I. Introduction

Second, the composition of the CPU

3. Working principle

4. Detailed analysis of CPU instruction work

5. Python realizes the various components of the CPU

6. Integrated CPU

7. Programming for CPU, experiencing the workflow of ancient programmers

8. Summary


I. Introduction

How do CPUs work? It is a foggy problem that plagues novice users. We may know a few words such as program counter, RAM, registers, but we don't have a clear and general understanding of how these parts work and how the whole system works together.

This article uses forty lines of Python code to implement a minimal CPU. Make it programmable, support addition and subtraction operations, read and write memory, unconditional jump, and conditional jump functions. The reason why a relatively simple CPU is implemented is to let everyone understand the working principle of the CPU as a whole, and not to be fettered by the details prematurely.

"

The real CPU is to produce triodes or MOS tubes by etching on silicon wafers. These triodes act as switches: addition, subtraction and storage are realized between the switch opening and closing. Before, I used Python code to simulate a CPU similar to this article, starting from a switch. But here, we simulate the CPU at a higher level: use code to simulate large components, so that everyone can understand the CPU's work in principle.

"

Second, the composition of the CPU

The whole CPU consists of large components, as shown in the figure below:

Harvard Architecture CPU

This CPU adopts Harvard architecture (Harvard architecture). The Harvard structure is relatively simple and easy to understand, and it is relatively easy to implement. In the figure above, there are instruction RAM and data RAM, and the two RAMs are important symbols of the Harvard structure.

"

Two common CPU architectures are the Harvard and Von Neumann architectures. The Harvard architecture is a CPU design that stores program instructions and data in the same block of RAM. The von Neumann architecture, also known as the Princeton architecture, is a CPU design that combines program instruction memory and data memory.

"

The main difference between the Harvard structure and the von Neumann structure: the former program and data are stored in two RAMs, and the latter are stored in one RAM.

3. Working principle

Let's walk through how each part works individually, and then how they work together.

3.1 Working principle of each component

Each component in the above figure has a corresponding physical circuit in the real CPU, and their functions are:

  • The pc counter generates 0, 1, 2, ... counting from 0. It can be cleared, or a number can be input from the outside, and the counting starts from this number. This is called setting. Used to indicate program and data access locations.

  • RAM, a random access memory for storing data, supports reading data according to the address (0x01 such shaping), and writing data according to the address and write signal w. Used to store programs and data.

  • Register, the memory that stores 8-bit information, writes the current data according to w signal is 1, and w is 0 means read. Similar to RAM, but can only store 8 bits of information. It is often used to store instructions, addresses, and calculate intermediate quantities.

  • The adder completes the addition and subtraction of two numbers. When sub is 1, it means subtraction. When ci is 1, it means carry. This device is the core device used to form the ALU (Arithmetic Logic Unit). The real CPU is built with logic gates, multipliers, logic operation units, and so on.

  • The 21 selector is equivalent to a single-pole double-throw switch. According to the s21 signal, it is determined whether the 8-bit output comes from the left or right 8-bit input.

3.2 Collaborative working principle

The arrows in the above figure indicate the data flow direction, also known as the data path diagram. From the data path diagram, we can analyze how the CPU is designed.

The entire data path starts from the program counter pc, and the counter outputs numbers 0, 1, 2, 3, 4... from 0. Program code and data are stored in instruction RAM and data RAM, respectively. RAM accesses and stores data in numerically represented locations. According to counter address 0, 1, 2 etc., put the data in RAM into instruction register IR and data register DR respectively. Registers are equivalent to containers and variables, which store the data given to it by RAM.

The instruction code decoding in the instruction register generates CPU control instructions, these 0 and 1 represent low level and high level signals respectively, and the level signal controls such as whether the adder carries or not, whether to enable subtraction, whether to enable register writing, Select which input of the 21 selector is output, whether to reset the counter, and so on. Therefore, instructions are actually electrical signals that control the coordination of various components of the CPU.

The data in the data registers respectively go to the adder adder to perform addition and subtraction operations and then flow to the 21 selector, or may directly flow to the 21 selector to wait for selection. After the 21 selector selects, the data enters the accumulation register AC. The data of the accumulator decides whether to write or not according to whether the ac signal is high level 1. The data of the AC accumulator will participate in the next calculation or be stored in the data RAM according to the w signal.

At this point, we have completed one calculation, the program counter is incremented by 1, and the next calculation is performed. If this instruction is a jump instruction, the jump destination address is directly assigned to the program counter, and the program starts to execute from the new address.

Below we use a practical example to illustrate the CPU execution process.

4. Detailed analysis of CPU instruction work

A complete program is composed of instructions and data. Instructions are responsible for controlling the cooperative work of various components of the CPU, while data participate in specific calculations. The sample program completes 10+2-3, and then loops to subtract 3 until the result is 0, the program is complete. The instruction register is ramc, and the data register is ramd.

ramc = [0x18, 0x19, 0x1d, 0x02, 0x31, 0x30, 0x00]
ramd = [10, 2, 3, 0xff, 0x06, 0x02]

Let's learn how instructions control the CPU.

4.1 Instructions

We know above that the program counter starts outputting from 0. After the CPU completes the calculation operation, the entire counter will be incremented by 1 to obtain the next instruction to continue execution. Let's take the first instruction 0x18 fetched when pc is zero as an example to explain how the instruction is decoded, and then control each device of the CPU.

0x18 is 0b0001 1000 in binary. Each binary bit points to the enable terminal of a device. The so-called enabling end is the switch that makes this part work. For example, the two 1s here represent high level, which are respectively connected to the enabling terminal w of the data register DR and the accumulation register AC. In this way, when we read 0x18, we know that the CPU will write the data on the current data path into the data register and accumulation register.

Specifically, the corresponding relationship between each binary bit and the enabling terminals of each device of the CPU is an instruction. The design connection of this article is as follows. The current state is the state of the 0x18 command controlling the CPU device.

Instruction 0x18

By analogy, when any function is needed, set the connection position of the corresponding device enable signal to high level, that is, set it to 1, and the instruction set can be obtained as follows.

CPU instruction set

Let's analyze the specific execution process with the above example.

4.2 Analysis of instruction execution process

Let's turn it on and the pc starts running from scratch.

Instruction 0x18

  • In the figure above, when the program counter is 0, the 0th space of instruction RAM and data RAM is accessed, and 0x18 and 10 are stored in instruction register and data register respectively.

    • Instruction 0x18, binary 0b0001 1000, this is the load instruction, which indicates that the enable terminal w of the DR register and the AC register is 1 respectively.

    • Data 10 is stored as data in DR and AC registers

The operation of loading 10 is completed above.

Instruction 0x19

  • When pc is 1, access the first space of instruction RAM and data RAM, 0x19 and 2 are stored in instruction register and data register respectively.

    • Instruction 0x19, binary 0b0001 1001, this is the Add addition instruction, indicating respectively: the DR register saves data; the 21 selector is selected, and the calculation result of the adder is output; the result is saved into AC.

    • Data 2 is stored in DR as data, added to AC content 10 in the previous step, and then stored in AC.

The operation of 10+2 is completed above.

Instruction 0x1d

  • When pc is 2, access the second space of instruction RAM and data RAM, 0x1d and 3 are stored in instruction register and data register respectively.

    • Instruction 0x1d, binary 0b0001 1101, this is the Sub subtraction instruction, indicating respectively: the DR register saves data; the subtraction supported by the adder starts; the 21 selector is selected; the operation result is saved into AC.

    • Data, 3 is stored in AC as the difference between the data stored in DR and the result 12 of the previous step AC.

The operation of 12-3 is completed above.

Instruction 0x02

  • When pc is 3, access the third space of instruction RAM and data RAM, and store 0x02 into the instruction register.

    • Instruction 0x02, binary 0b0000 0010, this is the Store storage instruction, at this time, the w signal is 1, indicating to open the enable signal of the data RAM, so the 9 in the AC register is stored in the 3 position of the data RAM.

    • Data 0xff, because dr is 0, so the data register does not store data.

The above completes writing 9 to data RAM location 3.

Instruction 0x31

  • When pc is 4, access the fourth space of instruction RAM and data RAM, and store 0x31 into the instruction register.

    • If pre is 1 and AC is 0, write 0x06 into the pc calculator, and the program jumps to execute when pc is 6, which is the last step to command HLT to stop.

    • If AC is not zero or pre is not 1, continue to execute pc+1 downwards, that is, pc is 5.

    • Instruction 0x31, binary 0b0011 0001, this is the Jz zero jump instruction, indicating to reset the pc counter according to whether the AC result is zero and the program counter setting signal pre is 1.

    • Data, 0x06 is stored in DR as data. Reset the pc counter based on pre signal being 1 and AC being 0 no.

The above completes the function of jumping to different positions according to whether the calculation result is zero or not.

Instruction 0x30

  • When pc is 5, access the fifth space of instruction RAM and data RAM, and store 0x30 into instruction register.

    • Instruction 0x30, binary 0b0011 0000, is an unconditional jump instruction Jmp, indicating to reset the pc counter.

    • Data, 0x02 is stored in DR as data, and the pc counter is reset according to the pre signal.

    • The instruction pointed to by 0x02 is 0x1d, which is a subtraction instruction, that is, continue to perform -3 operation.

    • At this time, pc points to 0x02, and the subtraction calculation is performed again.

The operation of continuing-3 is completed above.

Command 0x00

  • When pc is 6, access the sixth space of instruction RAM and data RAM, and store 0x00 into instruction register.

    • Command 0x00, binary 0b0000 0000, this is the Hlt command, indicating stop.

    • The command data is meaningless.

    • The address of this instruction can only jump from when pc is 4.

In fact, after understanding the above execution process and examples, you can basically understand the basic working principle of the CPU. Let's implement these devices with Python language.

5. Python realizes the various components of the CPU

5.1 RAM memory

We use list to store data. This is a very simple and straightforward design.

ramc = [0x18, 0x19, 0x1d, 0x02, 0x31, 0x30, 0x00]

The reading and writing of the memory, according to the pc pointer, ramc[pc]=data means writing to the memory, that is, reading  ramc[pc].

5.2 Adder adder

def adder(a=0, b=0, ci=0, sub=0):
    return a-b+ci if sub == 1 else a+b+ci

The real adder uses logic gates, which is equivalent to a bunch of switches stacked together in a certain relationship. Here we use high-level language simulation, which greatly simplifies the implementation. This adder realizes the addition of a and b, while ci means carry, and sub means subtraction.

5.3 Register register

The register is designed using Python's closure concept, which is to use free variables to remember the last state of the register. When we use  AC = register() call, AC is equivalent to the returned internal function register_inner, at this time temp as a free variable and register_inner belong to the same closure. Therefore, the reading and writing of the temp variable is a persistent variable. It is equivalent to maintaining the state.

Write when the w signal is 1, which is equivalent to the enable terminal w of the register.

def register():
    temp = 0

    def register_inner(data=0, w=0):
        nonlocal temp
        if w == 1:
            temp = data
        return temp
    return register_inner
"

In real CPU design, how to design registers is a big question. Even in the superficial CPU model learning of the microcomputer principle course, it takes a lot of thought to understand that relays and triodes can memorize. This article uses a high-level language to simulate the underlying hardware, we can only go around again, so here we need a deep understanding of the concept of closures.

"

5.4 8bit 21 selector

21 The selector returns b when the sel end is 1. When sel is zero, returns a. That is, one of the two inputs is selected as the output.

def b8_21selector(a=0, b=0, sel=0):
    return a if sel == 0 else b

6. Integrated CPU

When we integrate the various components of the CPU, we first create new components, then initialize them, and finally set the pc to zero to start an infinite loop.

In the cycle process, first write the data in the program instruction RAM into the instruction register, decode each control signal according to the instruction register, and then operate under the control of the instruction control signal.

First, if the IR command register is the HLt stop command, then the system Break. Otherwise, decide whether to write the data signal into the DR data register according to dr.

The operation of the adder is automatic. One of its inputs is the AC accumulator register, and the other input is the DR data register, which is controlled by the sub subtraction control signal.

After the addition operator operates, the result is sent to the 21 selector, together with the data coming directly from the data bus, waiting for the s21 signal to be selected, and then stored in the AC accumulation register according to the ac signal for the next calculation.

zf is used as a zero flag register, if the result stored in the AC accumulator is zero, then zf is 1. At this time, if pre is 1, then pc can be set as the value of the DR data register, realizing the jump function when the operation result is zero. Otherwise, continue to execute downwards.

After overall integration, the code is as follows:

def adder(a=0, b=0, ci=0, sub=0):
    return a-b+ci if sub == 1 else a+b+ci
def b8_21selector(a=0, b=0, sel=0):
    return a if sel == 0 else b
def register():
    temp = 0
    def register_inner(data=0, w=0):
        nonlocal temp
        if w == 1:
            temp = data
        return temp
    return register_inner
def int2bin(data=0, length=8, tuple_=1, string=0, humanOrder=0):
    #辅助函数,整数转换为二进制字符串或者元祖。
    r = bin(data)[2:].zfill(length)
    r = r[::-1] if humanOrder == 0 else r
    return r if string == 1 else tuple(int(c) for c in r)
def cpu():
    pc = 0 # pc 计数器从 0 开始,无限循环。
    IR, DR, AC = register(), register(), register() # 新建寄存器
    ramc = [0x18, 0x19, 0x1d, 0x02, 0x31, 0x30, 0x00] # 初始化代码
    ramd = [10, 2, 3, 0xff, 0x06, 0x02] # 初始化数据

    IR(0, w=1) # 初始化寄存器
    DR(0, w=1)
    AC(0, w=1)
    while True:
        IR(ramc[pc], w=1) # 指令读写
        *_, pre, dr, ac, sub, w, s21 = int2bin(IR(), humanOrder=1) # 指令解码
        if IR() == 0:
            break # HLT信号
        DR(ramd[pc], w=dr) # 数据读写
        r = adder(AC(), DR(), sub=sub) # 加法器自动加法
        AC(b8_21selector(DR(), r, s21), w=ac) # 选择器选择后,累加寄存器读写
        ramd[pc] = AC() if w else ramd[pc] # 根据 w 信号,数据写入 RAM
        zf = (AC() == 0) # 零标志位寄存器
        pc = DR() if (pre == 1 and zf == True and s21 == 1) else pc + 1 # Jz 指令跳转
        pc = DR() if (pre == 1 and s21 == 0) else pc # 无条件跳转 Jmp
        print(AC()) 
if __name__ == '__main__':
    cpu()

It can be seen that the output result is: 10,12,9,9,9,9,6,6,6,6,3,3,3,3,0,0,0, the program works normally, and the CPU works!

7. Programming for CPU, experiencing the workflow of ancient programmers

Let's write a program for our toy CPU that subtracts 1 from 5 until it is 0: first, we need to load 5, then subtract 1, judge whether it is zero, jump to stop if it is zero, and continue to jump to Minus 1 place.

Code and data are written separately:

    ramc = [0x18, 0x1d, 0x31, 0x30, 0x00]
    ramd = [5,    1,    0x04, 0x01]

Program output:

5,4,4,4,3,3,3,2,2,2,1,1,1,0,0

works fine!

"

If we convert these data into binary, it is obviously 8 bit information, and 8 small holes per unit are sequentially engraved on the paper tape, and then it can be run on an ancient computer. This is what the first generation of programmers did.

"

8. Summary

To understand the working principle of the CPU, it is important to understand that the pc keeps incrementing its address and executes program instructions sequentially. When a jump instruction is encountered, the pc is reset to the new address. In the process of sequentially executing program instructions, each step is to analyze program instructions, generate control signals, and then control the working status of all CPU-related devices, generate program calculation results, and save them in registers or RAM.

From a macro perspective, the working principle of the CPU is to read the memory data, complete the calculation in the ALU, and then save it into the memory, and the input and output system completes the interaction with other peripherals; At the beginning, read the program instruction register, then analyze the instruction, and control the specific process of each component; from a microscopic point of view, all CPU-related components including the pc program counter, ALU digital logic operation unit, and RAM memory are actually one by one. Transistors, these triodes are turned on or off under the action of current, and complete all functions of digital logic operations, maintaining memory states, and generating pulse signals.

This article builds and simulates a CPU from the meso level, and implements a simple toy-level CPU with 40 lines of Python code. It enables him to complete addition and subtraction operations, and has the functions of reading and writing memory, jumping, and conditional jumping. The full text is relatively dry, thank you for reading!

Guess you like

Origin blog.csdn.net/weixin_69999177/article/details/128541944