Have you mastered the embedded basic knowledge points that novices often overlook?

Click on "Uncle Wheat" above and select "Top/Star Public Account"

Welfare dry goods, delivered as soon as possible

Any technology, as long as it is advanced enough, cannot be distinguished from magic.

- Arthur Clark

Countless machines have been invented to solve various problems. Embedded devices range from computers embedded in Mars rovers to systems designed to operate the navigation systems of nuclear submarines.

Von Neumann proposed the first computing model in 1945, and almost all computers, whether laptops or telephones, follow the same working principle.

So do you understand how computers work? This article will discuss these:

◎ Understand the basics of computer architecture

◎ Choose a compiler to convert the code into instructions that the computer can execute

◎ Improve the storage speed of data according to the memory hierarchy

After all, programming has to be magical to non-programmers, and we programmers don't see it that way.

Architecture

A computer is a machine that manipulates data according to instructions, and is mainly composed of a processor and a memory. Memory, also known as RAM (Random Access Memory), is used to store instructions and data that needs to be manipulated. The processor, also known as the CPU (Central Processing Unit), fetches instructions and data from memory and performs corresponding calculations. Next, we'll discuss how these two parts work.

memory

Memory is divided into cells, each cell storing a small amount of data, identified by a numerical address. When reading or writing data in memory, one cell at a time is operated on.

To read or write to a particular memory cell, the numerical address of that cell must be found.

Since memory is an electrical component, cell addresses are transmitted over signal lines as binary numbers.

Binary numbers are represented in base 2 and work as follows:

2bca63d6a36898f165b9d711110be8a5.png

Each signal line transmits one bit, with a high voltage representing a signal "1" and a low voltage representing a signal "0", as shown in Figure 7-1.

101844b2ac77356cfe25ce6ac69acaa8.png

For a given cell address, memory can do two things: get its value or store a new value, as shown in Figure 7-2. The memory includes a special signal line used to set the operating mode.

e1aa67758f90ec7fe5a22bbc6a65e7f8.png

Each memory cell usually stores an 8-bit binary number, which is called a byte. When set to "read" mode, the memory retrieves the bytes stored in the cell and outputs them over 8 data lines, as shown in Figure 7-3.

2c93337aad8f4e649ce2ca6315f3a676.png

When set to "write" mode, the memory takes a byte from the data transfer line and writes it to the corresponding location, as shown in Figure 7-4.

173031d82e2722a09dc6e476310b28af.png

A group of signal lines that carry the same data is called a bus. The 8 signal lines used to transmit addresses form the address bus, and the other 8 signal lines used to transmit data between memory cells form the data bus. The address bus is unidirectional (for receiving data only) while the data bus is bidirectional (for sending and receiving data).

In all computers, the CPU and RAM are constantly exchanging data: the CPU is constantly fetching instructions and data from RAM, and occasionally storing outputs and some computations in RAM, as shown in Figure 7-5.

d0dc2e3a41602db21be61521f1896547.png

CPU

The CPU consists of several internal storage units called registers that can perform simple mathematical operations on the numbers stored in these registers, as well as transfer data between RAM and the registers. The CPU can be instructed to perform the following typical actions:

◎ Copy data from storage location 220 to register 3;

◎ Add the numbers in register 3 and register 1.

The set of all operations that a CPU can perform is called the instruction set, and each operation in the instruction set is assigned a number. Computer code is essentially a sequence of numbers representing CPU operations, which are stored in RAM as numbers. Input/output data, some calculations, and computer code are stored in RAM.

The code can even modify itself by including instructions in RAM that rewrite parts of the code, a common way for computer viruses to evade detection by antivirus software. Similarly, biological viruses evade the host's immune system by altering their own DNA.

Figure 7-6, taken from the Intel 4004 Operation Manual, shows how some CPU instructions are mapped to numbers. As the manufacturing process evolves, the CPU supports more and more operations. Modern CPUs have extremely large instruction sets, but the most important ones existed decades ago.

074b763eae14092f7fb30ae5a70574e7.png

The operation of the CPU is never-ending, it is constantly fetching and executing instructions from memory. The core of this cycle is the PC register, and PC (program counter) is short for "program counter". The PC is a special register that holds the memory address of the next instruction to be executed. The workflow of the CPU is as follows:

(1) Get the instruction from the storage address specified by the PC;

(2) PC auto-increment;

(3) Execute the instruction;

(4) Return to step 1.

The PC is reset to the default value when the CPU is powered up, which is the address of the first instruction to be executed in the computer. This instruction is usually an immutable built-in program that loads the basic functions of the computer.

In many personal computers, this program is called BIOS (Basic Input Output System).

This fetch-execute cycle continues after the CPU is powered up until it is powered off. However, if the CPU can only follow an ordered, sequential list of operations, then it's no different from a fancy calculator. The magic of the CPU is that it can be instructed to write new values ​​to the PC, thereby branching execution, or "jumping" to other locations in memory. Such branching can be conditional. Take the following CPU instruction as an example: "If register 1 equals 0, set PC to address 200". This directive is equivalent to:

if x = 0
compute_this()
else
compute_that()

That's all. Whether it's opening a website, playing a computer game, or editing a spreadsheet, the calculations involved are the same, a series of simple operations that simply sum, compare, or move data in memory.

A large number of simple operations can be combined to express complex processes. Take the classic Space Invaders game, for example, whose code consists of about 3000 machine instructions.

e252da186ad98f464880e674c6e2a1e3.png

CPU Clock Space Invaders was all the rage back in the 1980s. The game runs on an arcade machine with a 2 MHz CPU. "2 MHz" refers to the clock of the CPU, which is the basic number of operations the CPU can perform per second. A CPU clocked at 2 million hertz (2 MHz) can perform approximately 2 million basic operations per second. It takes 5 to 10 basic operations to complete a machine instruction, so old arcade machines can run hundreds of thousands of machine instructions per second.

With the advancement of modern technology, ordinary desktop computers and smartphones are usually equipped with 2 GHz CPUs, which can execute hundreds of millions of machine instructions per second. Today, multi-core CPUs have been put into large-scale applications, such as quad-core 2 GHz CPUs that can execute nearly 1 billion machine instructions per second. Going forward, CPUs may be equipped with more and more cores.

CPU Architecture Have you ever wondered why PlayStation's game CDs don't work on desktop computers? Why don't iPhone apps work on Mac? The reason is simple because their CPU architectures are different.

The x86 architecture is now an industry standard, so the same code can be executed on most personal computers. However, taking into account the requirements of power saving, the CPU architecture used by mobile phones is different. Different CPU architectures mean different CPU instruction sets, which means different ways of encoding instructions as numbers. Instructions for a desktop computer CPU are not valid instructions for a mobile phone CPU, and vice versa.

32-bit and 64-bit architecture The first CPU was the Intel 4004, which used a 4-bit architecture. In other words, the CPU can perform sum, compare, and move operations on up to 4-bit binary numbers in a single machine instruction. Intel 4004 has only four data buses and address buses.

Before long, 8-bit CPUs, which were used in early personal computers running DOS, became popular. In the 1980s and 1990s, the famous portable game console Game Boy used 8-bit processors. Such CPUs can operate on 8-bit binary numbers in one instruction.

Rapid advances in technology made 16-bit and later 32-bit architectures dominate. The CPU registers are then enlarged to accommodate 32-bit numbers. Larger registers naturally lead to larger data and address buses: an address bus with 32 signal lines can address 232 bytes (4 GB) of memory.

People's thirst for computing power never stops. Computer programs are becoming more and more complex and consume more and more memory, and 4 GB of memory is no longer sufficient. Addressing more than 4 GB of memory using numeric addresses that fit into 32-bit registers was tricky, which motivated the rise of 64-bit architectures, which today dominate. 64-bit CPUs can operate on extremely large numbers in a single instruction, and 64-bit registers store addresses in massive amounts of memory: 264 bytes equals over 17 billion gigabytes (GB).

Big-Endian Versus Little-Endian Some computer designers believe that numbers should be stored in RAM and CPU in left-to-right order, a pattern known as little-endian. Other computer designers prefer to write data in memory in right-to-left order, a pattern known as big-endian. Therefore, the binary sequence 1-0-0-0-0-0-1-1 represents a different number depending on the "endianness".

◎ Big endian: 27 + 21 + 20 = 131

◎ Little endian: 20 + 26 + 27 = 193

Most current CPUs use little-endian mode, but there are also many computers that use big-endian mode. If a big-endian CPU needs to interpret data produced by a little-endian CPU, steps must be taken to avoid endian mismatches. Programmers operate directly on binary numbers, and this is especially important when parsing data from network switches. Although most computers today use little-endian mode, Internet traffic is still standardized on a big-endian basis because most early network routers used big-endian CPUs. Big-endian data will be garbled when read in little-endian mode, and vice versa.

Emulators In some cases, it is necessary to run some code designed for different CPUs on the computer in order to test iPhone applications without an iPhone, or to play popular old-school Super Nintendo games. This is achieved through software called an emulator.

The emulator is used to imitate the target machine, it is assumed to have the same CPU, RAM, and other hardware. The simulator program decodes the instructions and executes them on the simulated machine. As you can imagine, if the two machines have different architectures, it is not trivial to emulate one machine inside the other. Fortunately, modern computers are far faster than their predecessors, so simulation is not impossible. We can use the Game Boy emulator to create a virtual Game Boy in the computer and then play games as if we were using an actual Game Boy.

translater

By programming computers, MRIs, sound recognition, planetary exploration, and many other complex tasks can be performed. It's worth noting that everything a computer does is ultimately done by simple CPU instructions, which boil down to summing and comparing numbers. And complex computer programs such as web browsers require millions or even billions of these machine instructions.

But we rarely write programs directly using CPU instructions, nor can we develop a realistic 3D computer game in this way. Programming languages ​​were created to express commands in a more "natural" and compact way. We write code in these languages, and then a program called a compiler converts the commands into machine instructions that the CPU can execute.

We use a simple mathematical analogy to explain what a compiler does. Suppose we ask someone to calculate the factorial of 5.

5! = ?

But if the respondent doesn't understand what a factorial is, it doesn't make sense to ask this question. We must reformulate the problem with simpler operations.

5×4×3×2×1 = ?

However, what if the respondent can only do additions? We must further simplify the formulation of the problem.

5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 +5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 = ?

As you can see, the simpler the form of expressing the computation, the greater the number of operations required. The same goes for computer code. A compiler converts complex instructions in a programming language into equivalent CPU instructions. Combined with powerful external libraries, complex programs containing billions of CPU instructions can be represented in relatively few lines of code that are easy to understand and modify.

Alan Turing, the father of computing, discovered that simple machines are capable of computing anything computable. If a machine has general computing power, it must be able to follow a program containing instructions to:

◎ Read and write data in the memory;

◎ Execute conditional branch: If the memory address has the given value, jump to another point in the program.

We call a machine with this general computing power Turing complete. Regardless of the complexity or difficulty of the computation, it can be expressed in simple read/write/branch instructions. These instructions can compute anything, as long as enough time and storage are allocated.

ab6df9c13d19ef68ecdd2583c80a4d3e.png

It was recently discovered that a CPU instruction called MOV (data move) is Turing complete. This means that a CPU that can only execute MOV instructions is functionally no different from a full-fledged CPU: in other words, any kind of code can be expressed strictly through MOV instructions.

The important concept is that, simple or not, if a program can be coded in a programming language, it can be rewritten and run on any Turing-complete machine. A compiler is a magical program that automatically converts code from a complex language to a simple language.

operating system

Essentially, a compiled computer program is a sequence of CPU instructions. As mentioned earlier, code compiled for a desktop computer cannot run in a smartphone because the two use different CPU architectures. However, since the program must communicate with the computer's operating system to run, the compiled program may also not work on two computers that share the same CPU architecture.

In order to communicate with the outside world, programs must perform input and output operations, such as opening files, displaying messages on the screen, and opening network connections. But different computers use different hardware, so it's impossible for a program to directly support all different types of screens, sound cards, or network cards.

This is why programs depend on the operating system to execute. With the help of the operating system, programs can work with different hardware without any hassle. Programs create special system calls that request the operating system to perform the required input/output operations. The compiler is responsible for translating input/output commands into appropriate system calls.

However, different operating systems often use mutually incompatible system calls. For example, Windows uses different system calls to print information on the screen than macOS or Linux.

Therefore, programs compiled on Windows with x86 processors will not run on Macs with x86 processors. In addition to targeting a specific CPU architecture, compiled code also targets a specific operating system.

Compile optimization

Good compilers work on optimizing the machine code they generate. If the compiler thinks it can improve execution efficiency by modifying part of the code, it will handle it. The compiler may try to apply hundreds of optimization rules before generating binary output.

Therefore, the code should be made easy to read to facilitate micro-optimization. The compiler will eventually do all the subtle optimizations. For example, some people have a problem with the following code.

function factorial(n)
if n > 1
return factorial(n - 1) * n
else
return 1

They believe the following modifications should be made:

function factorial(n)
result ← 1
while n > 1
result ← result * n
n ← n - 1
return result

Granted, executing the factorial function without recursion will consume less computational resources, but there is still no reason to change the code for this. Modern compilers will automatically rewrite simple recursive functions, such as the following.

i ← x + y + 1
j ← x + y

To avoid doing two x+y calculations, the compiler rewrites the above code as:

t1 ← x + y
i ← t1 + 1
j ← t1

You should focus on writing clear and self-explanatory code. If performance is a problem, you can use profiling tools to find bottlenecks in your code and try a better way to compute the problematic code instead. Also, avoid wasting too much time on unnecessary micro-ops.

But in some cases we want to skip compilation, which will be discussed next.

scripting language

Some languages ​​are not compiled directly to machine code at execution time. These languages ​​are called scripting languages, including JavaScript, Python, and Ruby. In a scripting language, the code is executed by an interpreter rather than the CPU, and the interpreter must be installed on the machine where the code is run.

The interpreter interprets and executes the code in real time, so it usually runs much slower than the compiled code. On the other hand, programmers can run the code immediately without waiting for the compilation process.

For very large projects, compilation can take hours.

Google engineers must constantly compile a lot of code, causing programmers to "lost" a lot of time (Figure 7-9). Google couldn't switch to a scripting language due to the need to guarantee better performance for compiled binaries. The company developed the Go language for this, which compiles extremely fast while still maintaining high performance.

00fabaf43c2bf3bb60902b6a441a68d3.png

Disassembly and Reverse Engineering

Given a compiled computer program, its source code cannot be recovered prior to compilation. But we can decode binary programs, converting the numbers used to encode CPU instructions into human-readable sequences of instructions. This process is called disassembly.

Next, you can look at these CPU instructions and try to analyze their purpose, which is called reverse engineering. Certain disassemblers are helpful in this process by automatically detecting and annotating system calls and commonly used functions. With disassembly tools, hackers know every aspect of the binary code. I believe that many of the top IT companies have secret reverse engineering labs in order to study competitor software.

Underground hackers often analyze binary code in licensed programs like Windows, Photoshop, Grand Theft Auto, etc., to determine which part of the code is responsible for validating software licenses. The hacker modifies the binary code and adds an instruction to jump directly to the part of the code that is executed after the license is verified. When running the modified binary, it gets the injected JUMP command before checking the license, allowing it to run illegal pirated copies without paying.

Inside the secretive government intelligence agencies are also labs for security researchers and engineers to study popular consumer software such as iOS, Windows, and Internet Explorer. They look for possible security holes in these programs to defend against cyberattacks or intrusions into high-value targets. The most well-known of these attacks is Stuxnet, a cyber weapon developed by U.S. and Israeli intelligence agencies. By infecting computers that control underground fusion reactors, Stuxnet has slowed Iran's nuclear program.

open source software

As mentioned earlier, we can analyze the original instructions about the program from the binary executable, but cannot recover the original source code used to generate the binary.

Without the original source code, it is practically impossible to make any major changes to the program (like adding new features), even if the binary can be slightly modified to break in a minor way. Some people favor a collaborative approach to building code, and so open up their source code for others to modify. This is the main concept of "open source": software that everyone can use and modify freely. Linux-based operating systems such as Ubuntu, Fedora, and Debian are open source, while Windows and macOS are closed source.

One of the interesting things about open source operating systems is that anyone can inspect the source code for security holes. Government agencies have been shown to exploit and surveil millions of civilians through unpatched security flaws in everyday consumer software.

But with open source software, the code gets more attention, making it difficult for malicious third parties and government agencies to implant surveillance backdoors. When using macOS or Windows, users must trust that Apple or Microsoft will not pose a hazard to their security and do their best to prevent any serious security breaches. Open source systems are placed under public scrutiny, so security vulnerabilities are less likely to go unnoticed.

memory hierarchy

We know that the operation of a computer boils down to making the CPU execute simple instructions that can only operate on data stored in the CPU's registers. However, the storage space of registers is usually limited to 1000 bytes, which means that data transfer between CPU registers and RAM must be constantly performed.

If the memory access speed is too slow, the CPU will be forced to idle, waiting for the RAM to complete the data transfer. The time required for the CPU to read and write data in memory is directly related to computer performance. Increasing memory speed helps speed up your computer, and it also increases the speed at which the CPU can access data. The CPU can access data stored in registers at near real-time speed (within one cycle), but RAM is much slower.

For a CPU clocked at 1 GHz, a cycle lasts about a billionth of a second, which is the time it takes for light to travel from the book into the reader's eye.

The divide between processor and memory

Technological developments in recent years have multiplied the speed of CPUs. While the memory speed has also improved, it is much slower. This performance gap between CPU and RAM is called the "processor-memory gap". We can execute a lot of CPU instructions, so they are "cheap"; fetching data from RAM takes longer, so they are "expensive". As the gap between the two gradually increases, the importance of improving memory access efficiency becomes more apparent.

b3259dfabf577d9690a039e373b653b4.png

Modern computers take about 1000 CPU cycles (1 microsecond or so) to fetch data from RAM. That's impressive speed, but it's still slow compared to the time it takes to access CPU registers. Reducing the number of RAM operations required for computation is a goal pursued by computer scientists.

Between two face-to-face people, it takes about 10 microseconds for a sound wave to travel.

Temporal locality and spatial locality

When trying to minimize access to RAM, computer scientists began to notice two facts.

◎ Temporal locality: When a memory address is accessed, it may be accessed again soon.

◎ Spatial locality: When accessing a certain storage address, the adjacent address may be accessed soon.

Therefore, keeping these memory addresses in CPU registers helps avoid most of the "expensive" operations on RAM. However, when designing CPU chips, industrial engineers did not find a feasible way to accommodate enough internal registers, but they still found out how to effectively use temporal and spatial locality. This will be discussed next.

L1 cache

An extremely fast auxiliary memory integrated inside the CPU can be built, which is the L1 cache. Reading data from L1 cache into registers is only slightly slower than fetching data directly from registers.

With L1 cache, we can load data into CPU registers extremely fast by copying the contents of potentially accessed memory addresses into close proximity to CPU registers. Reading data from the L1 cache into a register only takes about 10 CPU cycles, which is nearly a hundred times faster than fetching data from RAM.

With about 10 KB of L1 cache and rational use of temporal and spatial locality, more than half of the RAM access calls can be achieved only through the cache. This innovation revolutionized computing technology. The L1 cache can greatly reduce the waiting time of the CPU, so that the CPU can spend more time in the actual calculation rather than in an idle state.

L2 cache

Increasing the size of the L1 cache helps reduce fetches from RAM, which in turn reduces CPU latency. However, increasing the L1 cache also slows it down. Once the L1 cache reaches around 50 KB, it becomes prohibitively expensive to continue to increase its capacity. A better solution is to build a cache called a second level cache. The second level cache is slightly slower, but the capacity is much larger than the first level cache. Modern CPUs are equipped with a L2 cache of about 200 KB, and it takes about 100 CPU cycles to read data from the L2 cache into the CPU registers.

We copy the most likely addresses to the L1 cache and the more likely addresses to the L2 cache. If the CPU does not find a memory address in the L1 cache, it can still try to search in the L2 cache. The CPU only needs to access RAM if the address is neither in L1 nor L2 cache.

At present, many manufacturers have introduced processors with L3 cache. The capacity of the L3 cache is larger than that of the L2 cache, and although it is not as fast as the L2 cache, it is still much faster than the RAM. L1/L2/L3 caches are very important, they take up most of the silicon space inside the CPU chip. See Figure 7-11.

0dcbf271feb3d6d3362fc75834e136d1.png

Using L1/L2/L3 cache can significantly improve computer performance. With a 200 KB L2 cache, less than 10% of the storage requests issued by the CPU must be fetched directly from RAM.

When the reader buys a computer in the future, please remember to compare the capacity of the L1/L2/L3 cache for the selected CPU. The better the CPU, the bigger the cache. In general, it is recommended to choose a CPU with a slightly lower clock frequency but a larger cache capacity.

First-level memory and second-level memory

As mentioned earlier, computers come with different types of memory, which are arranged in a hierarchy. The best performing memories are limited in capacity and extremely expensive. Down the hierarchy, more and more storage space is available, but access becomes slower and slower.

af74dbd8523beab685d55d9ef52e518b.png

Below the CPU registers and caches in the memory hierarchy is RAM, which is responsible for storing data and code for all currently running processes. As of 2017, computers are typically equipped with 1 GB to 10 GB of RAM. But in many cases, RAM may not be enough for the operating system and all running programs.

Therefore, we have to dig deeper into the memory hierarchy, using the hard disk below the RAM. As of 2017, computers are typically equipped with hard drives with a capacity in the hundreds of gigabytes, enough to hold all the program data currently running. If the RAM is full, the currently free data will be moved to the hard disk to free some memory space.

The problem is that hard disks are very slow, it typically takes 1 million CPU cycles (1 millisecond)a to transfer data between disk and RAM. Accessing data from disk may seem fast, but don't forget that it only takes 1000 cycles to access RAM and 1 million cycles to access disk. RAM is often referred to as first-level memory, while disks that store programs and data are referred to as second-level memory.

Standard photos capture light in about 4 milliseconds.

The CPU cannot directly access the secondary memory. Before executing a program stored in the second-level memory, it must be copied to the first-level memory. In fact, every time you start your computer, even the operating system is copied from disk to RAM, otherwise the CPU can't run.

Ensuring RAM never runs out During typical activities, it is critical to ensure that all data and programs your computer processes can be loaded into RAM, otherwise the computer will constantly be exchanging data between disk and RAM. Due to the extremely slow speed of this operation, the performance of the computer will be severely degraded or even unusable. In this case, the computer has to spend more time waiting for the data to transfer, and can't do the actual calculation.

A computer is said to be in thrashing mode when it is constantly reading data from disk into RAM. The server must be constantly monitored, and if the server starts processing data that cannot fit into RAM, the jitter can cause the entire server to crash. There will be long queues at the bank or cash register, and the waiter will have no choice but to blame the jittery computer system. Insufficient memory can be one of the main reasons for server failure.

External memory and tertiary memory

We continue our analysis down the memory hierarchy. After connecting to a network, a computer can access storage managed by other computers. They are either on the local network or on the Internet (i.e. in the cloud). But accessing this data takes longer: reading local disk takes 1 millisecond, while fetching data from the network can take hundreds of milliseconds. It takes about 10 milliseconds for a network packet to travel from one computer to another, or 200 to 300 milliseconds over the Internet, about the blink of an eye.

At the bottom of the memory hierarchy is the tertiary memory, which is not always online and available. Storing gigabytes of data in a cassette or CD is inexpensive, but accessing data on such media requires inserting the media into some kind of reading device, which can take minutes or even days (Might as well try to get IT to back up the data on tape on Friday night...). For this reason, tertiary storage is only suitable for archiving infrequently accessed data.

Development Trend of Storage Technology

On the one hand, it is difficult to significantly improve the technology used in "fast" memory (at the top of the memory hierarchy); on the other hand, "slow" memory is getting faster and cheaper. The cost of hard drive storage has been falling for decades, and that trend looks set to continue.

New technologies have also made disks faster. People are moving from spinning disks to solid-state drives (SSDs), which have no moving parts, making them faster, more reliable, and more power efficient.

Disks with SSD technology are getting cheaper and faster, but they are still expensive. With this in mind, some manufacturers have introduced hybrid disks that use both SSD and magnetic technology. The latter stores more frequently accessed data on SSDs and less frequently accessed data on slower disks. When data that was previously infrequently accessed needs to be accessed frequently, it is copied to the faster SSD in the hybrid drive. This is similar to how the CPU uses its internal cache to speed up RAM access.

f51f0db9f560f97c383887a59052d9af.png

summary

This article describes some basic computer working principles. Anything computable can be represented by simple instructions. To convert complex computational commands into simple instructions that the CPU can execute, a program called a compiler is used. Computers can perform complex calculations only because the CPU can perform a large number of basic operations.

Computers have fast processors but relatively slow memory. The CPU does not access memory in a random manner, but follows the principles of spatial and temporal locality. Therefore, more frequently accessed data can be cached in faster memory. This principle is applied at multiple levels of cache: from L1 cache all the way up to L3 memory.

The caching principles discussed in this article can be applied in a variety of scenarios. Identifying data that is frequently used by applications and finding ways to increase the speed of access to this data is one of the most common strategies for reducing the runtime of computer programs.

——This article is selected from "The Essence of Computer Science"

—— The End ——

Recommended in the past

A collection of domestic alternative selections for single-chip microcomputers, it is imperative!

Walk into the fab and gain an in-depth understanding of the chip manufacturing process

The degree of automation of the DJI factory is really a bit high

Practical single-chip reverse circuit, no longer afraid of reverse power supply

Programming language war, who is the strongest language?

Summary of chip design companies with an annual output of over 100 million

Click on the card above to follow me

8641dc82d2a7d757d91b8f6493a8edf9.png

Everything you ordered looks good , I take it seriously as I like it

Guess you like

Origin blog.csdn.net/u010632165/article/details/124138691