Condensed Notes on Operating Systems (1) - Comprehensive Basics

 Main quotes from the article notes:

Axiu’s study notes (interviewguide.cn)

Kobayashi coding (xiaolincoding.com)

Do you understand virtual technology?

Virtual technology converts a physical entity into multiple logical entities.

There are two main virtual technologies: time (time) division multiplexing technology and space (space) division multiplexing technology .

Multi-process and multi-thread: Multiple processes can be executed concurrently on the same processor using time-division multiplexing technology, allowing each process to take turns occupying the processor, executing only a small time slice each time and switching quickly.

Virtual memory uses space division multiplexing technology, which abstracts physical memory into virtual space, and each process has its own address space. Pages in virtual space are mapped to physical memory. Not all pages in virtual space need to be in physical memory. When a page that is not in physical memory is used, the page replacement algorithm is executed to replace the page into memory.

Do you know the principle of locality? What are the two main locality principles? What are each?

It is mainly divided into temporal locality and spatial locality .

Temporal locality: If an instruction in the program is executed, it is likely to be executed again soon; if a certain data has been accessed, the data is likely to be accessed again soon. (Because there are a lot of loops in the program)

Spatial locality: Once a program accesses a certain storage unit, it is very likely that nearby storage units will also be accessed shortly after. (Because a lot of data is stored continuously in memory, and program instructions are also stored in memory sequentially)

What is the difference between ASCII, Unicode and UTF-8 encoding?

ASCII

ASCII has only 127 characters, representing the upper and lower case of English letters, numbers and some symbols. However, because other languages ​​use ASCII encoding to represent not enough bytes, for example: commonly used Chinese requires two bytes and cannot conflict with ASCII. China has customized GB2312 The encoding format is the same. Languages ​​in other countries also have their own encoding formats.

Unicode

Since each country's language has its own encoding format, garbled characters will appear in multi-language edited texts, so Unicode came into being. Unicode unifies these languages ​​into a set of encoding formats . Usually two bytes represent one Characters , and ASCII represents one character per byte , so if the text you compile is all in English, Unicode encoding requires twice as much storage space as ASCII encoding, which is very uneconomical in terms of storage and transmission.

UTF-8

In order to solve the above problems, there is a method to convert Unicode encoding into " variable length encoding " UTF-8 encoding. UTF-8 encoding encodes Unicode characters into 1-6 bytes according to numerical size , and English letters are encoded into one word. Section, commonly used Chinese characters are encoded into three bytes. If the text you compile is pure English, then using UTF-8 will save a lot of space, and the ASCII code is also part of UTF-8.

The connection between the three

After clarifying the relationship between ASCII, Unicode and UTF-8, we can summarize the current common character encoding methods in computer systems:

(1) Unicode encoding is used in computer memory. When it needs to be saved to the hard disk or transferred, it is converted to UTF-8 encoding.

(2) When editing with Notepad, the UTF-8 characters read from the file are converted into Unicode characters and stored in the memory. After the editing is completed, Unicode is converted into UTF-8 and saved to the file when saving. As shown below (screenshot of other people’s pictures)

When browsing the web, the server will convert the dynamically generated Unicode content into UTF-8 and then transmit it to the browser:

How are atomic operations implemented?

Atomic operation: When a processor reads a byte , other processors cannot access the memory address of this byte . The processor provides two mechanisms , bus locking and cache locking, to ensure the atomicity of complex memory operations.

(1) Use bus locks to ensure atomicity: ensure that when CPU1 reads or writes a shared variable, CPU2 cannot operate the cache that caches the memory address of the shared variable.

The processor uses bus locks to solve this problem. The so-called bus lock uses a LOCK# signal provided by the processor. When a processor outputs this signal on the bus, the requests of other processors will be blocked, and then the processor can exclusively occupy the shared memory.

(2) Use cache locks to ensure atomicity: At the same time , we only need to ensure that the operation on a certain memory address is atomic, but the bus lock locks the communication between the CPU and the memory , so the bus lock The overhead is relatively large.

Frequently used memory will be cached in the processor's L1, L2, and L3 caches, so atomic operations can be performed directly in the processor's internal cache without declaring a bus lock.

The so-called " cache lock " means that if the memory area is cached in the processor's cache line and is locked during the Lock operation, then when it performs the lock operation and writes back to the memory , the processor modifies the internal memory address, allowing it to The cache coherence mechanism ensures the atomicity of the operation, because the cache coherence mechanism prevents the memory area data cached by more than two processors from being modified at the same time. When other processors write back the data of the cache line that has been locked, it will To invalidate the cache line, a cache lock is used when CPU1 modifies i in the cache line, so CPU2 cannot use the cache line that also caches i.

But there are two situations where the processor will not use cache locking. The first situation is: when the data being operated cannot be cached within the processor , or the data being operated spans multiple cache lines, the processor will call bus locking . The second case is: some processors do not support cache locking. For Intel 486 and Pentium processors, bus locking is invoked even if the locked memory region is in the processor's cache line.

When will the data in the Cache be written back to the memory?

  • Write Through : Write data into memory and Cache at the same time . This method is called Write Through.
  • Write Back : When a write operation occurs, new data is only written to the Cache Block . Only when the modified Cache Block is " replaced " does it need to be written to the memory.

cache inconsistency

The caches of the two CPU cores are inconsistent.

MESI protocol

  • Modified , modified
  • Exclusive,exclusive
  • Shared , shared
  • Invalidated , has expired

The " modified " status is the dirty mark we mentioned earlier , which means that the data on the Cache Block has been updated, but has not yet been written to the memory. The "Expired" status means that the data in this Cache Block has expired , and the data in this status cannot be read.

Both the "exclusive" and "shared" states represent that the data in the Cache Block is clean , that is to say, the data in the Cache Block and the data in the memory are consistent at this time.

The difference between "exclusive" and "shared" is that in the exclusive state, the data is only stored in the cache of one CPU core, while the caches of other CPU cores do not have the data. At this time, if you want to write data to the exclusive Cache, you can write directly and freely without notifying other CPU cores. Because you are the only one who has this data, there is no cache consistency problem, so you can write it casually. Manipulate this data.

In addition, if the data in the "exclusive" state is read by other cores from the memory to their respective caches, then at this time, the data in the exclusive state will become a shared state.

Then, the " shared " state means that the same data exists in the Cache of multiple CPU cores, so when we want to update the data in the Cache, we cannot directly modify it , but must first broadcast a message to all other CPU cores. The request requires that the corresponding Cache Line in the cache of other cores be marked as " invalid " first, and then the data in the current cache is updated.

Can you distinguish between system concurrency and parallelism?

Concurrency refers to the ability to run multiple programs at the same time within a macroscopic period , while parallelism refers to the ability to run multiple instructions at the same time .

Parallelism requires hardware support, such as multi-pipeline, multi-core processors or distributed computing systems.

The operating system enables programs to run concurrently by introducing processes and threads.

What is sharing?

Sharing means that resources in the system can be used by multiple concurrent processes.

There are two sharing methods: mutually exclusive sharing and simultaneous sharing.

Mutually exclusive shared resources are called critical resources, such as printers. Only one process is allowed to access them at the same time, and a synchronization mechanism is required to achieve mutually exclusive access.

Guess you like

Origin blog.csdn.net/shisniend/article/details/131863648