Operating system threads and processes

1. What is an operating system

The operating system is essentially a piece of software, which plays a management role, and can manage software and hardware, allowing them to run and use in an orderly manner.

For the lower layer of the operating system, the operating system needs to manage the hardware well; for the upper layer of the operating system, it needs to provide a stable operating environment for the software, so the operating system is the medium of interaction between software, hardware, and users.

img

Hardware equipment: physical equipment, such as computer back cover open to see are hardware equipment.

Driver: The hardware manufacturer will provide the driver when developing the hardware, and the corresponding driver must be installed on the computer to allow the system to correctly identify the hardware device.

Operating system kernel: It is the core function of the operating system, connecting the previous and the next.

System call: Some API interfaces provided by the operating system to applications. For example, if a program wants to operate a hardware device, it needs to pass a system call to tell the operation command to the system kernel, and the kernel calls the driver to further operate the hardware device.

Application program: QQ, written Java code, etc. are all application programs.

Common Operating Systems :

Windows: windows98, 2000, xp, vista, win7, win10, win11 (the most familiar operating system).

Linux: Especially suitable for development and deployment (a system that programmers must master).

Mac: The system used by Apple Computer, which has some similarities with Linux.

There are also operating systems on some mobile phones; Android, IOS, etc.

2. Process and thread

1. Process

Simply speaking, a process is a running program. By double-clicking an executable file ( .exe), the operating system will load the executable file into the memory and execute the instructions in the executable file on the CPU exe. When it wakes up, it becomes a running program, also called a process (process), also called a task.

Many computers today are in multi-process mode. Macroscopically, many processes can be executed at the same time.

img

The above part of the process is the executable file that we execute the application by ourselves, and the other part is the process automatically started by the operating system.

2. Thread

A thread is an internal part of a process. Comparing a process to a factory, threads are the production lines in the factory, so a process can include many threads, and a process has at least one thread.

The java code we write every day is actually run through the java process (jvm).

3. Process management

A process is an important "software resource", which is managed by the operating system kernel. A process contains one or more threads. When the first thread is opened in a process, resource allocation needs to be performed. After that, other threads opened in the process Threads reuse the same resource.

A thread is described by a structure (C language) (the operating system is basically implemented by C/C++), this structure is also called a process control block (PCB), and these several structures describe Process, after describing the process, then organize the thread, the essence of the thread is also data, then the organization of the data needs to be organized through the data structure, in the Linux system, the thread is organized through the double linked list, that is, through a The doubly linked list strings multiple PBCs together.

Creating a thread is essentially creating a structure object like PCB and inserting it into the linked list.

Destroying a thread is essentially deleting the PCB node on the linked list.

The task manager sees the process list, which essentially traverses the PCB linked list.

So adding, deleting, checking, and modifying threads is essentially adding, deleting, checking, and modifying double-linked lists. Of course, this double-linked list is a complex double-linked list, but the basic idea of ​​​​operating threads is the same as operating double-linked lists.]

3.1 Some attributes in PBC

There are many attributes contained in the PCB, and some core attributes are introduced below.

Process id (pid) : An identifier of the process, which is a unique number.

img

Memory pointer : indicates the memory allocated by itself.

File descriptor table : files and other resources on the hard disk.

Once a process runs, the operating system will automatically open at least 3 files: standard input (System.in), standard output (System.out), standard error (System.err).

If you want a process to work normally, you need to allocate resources for this process, including but not limited to memory, hard disk, and CPU.

3.2 Concurrency and Parallelism

When allocating resources, it is easier to allocate other resources, but CPU resources are not easy to allocate: there may be dozens or hundreds of processes in the computer, and the CPU (mostly computers are multi-core CPUs) resources of the computer are very limited. Not enough to allocate to these processes at the same time.

But we also want the processes to run at the same time. As mentioned above, there can be multiple threads in a process, and we can let multiple threads execute concurrently (time-sharing multiplexing), so what is concurrency? Parallelism and
concurrency Concurrency is two concepts that are both similar and different. Parallelism means that two or more events occur at the same time, while concurrency means that two or more events occur within the same time interval.

From a macroscopic point of view, there is no difference between parallelism and concurrency, or we can’t perceive it with our mortal bodies.
From a microscopic point of view, parallelism means that two CPU cores run the code of two tasks at the same time, and concurrent means that one CPU core runs task A first. Then run the code of task B, that is, multiple tasks can be quickly switched and run on the CPU core; as long as they run fast enough, multiple tasks can be considered to be running at the same time.

3.3 Scheduling of processes

Thread is the most basic unit of system scheduling execution. In order to facilitate the description of process scheduling, it is assumed that the process here has only one thread, so that thread scheduling can be regarded as process scheduling. The process scheduling we talked about here is based on multiple The task operating system, from a macro point of view, can run multiple processes at the same time.

Status of the process :

  • Ready state: Ready, waiting for CPU scheduling execution at any time.
  • Running state: The state that the thread is running on the CPU.
  • Blocked state: The process cannot respond to the scheduling in time due to a series of factors, and cannot be executed on the CPU for a short time.

Priority :

Multiple processes, when the operating system schedules, are not completely equal, but have certain priorities.

context :

Indicates the running state of the program when the process was called out of the CPU last time (save the "intermediate state" of the process), so that the next time the process is dispatched to the CPU, it can continue to run in the last state.

The context of a process is the value of each register in the CPU, so before the process calls out to the CPU, it needs to save the data on the registers on the CPU to the memory, which is equivalent to archiving; the next time the process is called into the CPU, it will It can be restored to the register according to the data stored in the memory last time, which is equivalent to reading the file.

Accounting information :

The operating system counts the time occupied by each process on the CPU and the number of instructions, and determines how to schedule the next stage based on this.

Give an image scene to understand :

For example, if there are three processes A, B, and C, you might as well compare the operating system to a little sister, and threads A, B, and C to three little brothers. A is very handsome, B is very rich, and C is very good at licking.

This young lady is in love with A, B, and C at the same time, so how does this young lady talk about three boyfriends at the same time? In fact, it is very simple. The young lady only needs to do a good job of time management for ABC, so that ABC can unaware of the other's existence.

Therefore, the younger sister needs to properly arrange the time to fall in love with the three younger brothers, and make sure that the time does not overlap. For example:
because the younger sister especially likes the handsome, the second is the rich, and the last is the one who can lick, so the younger sister is A, The arrangements for B and C have priority , and A is given more time, followed by B and C last.
At this time, the young lady made a schedule for a week, and she is indeed a master of time management.

time plan
Monday fall in love with C
tuesday to wednesday fall in love with B
Thursday to Saturday fall in love with A
Sunday rest

At this time, the time plan of the young lady can be understood as the scheduling of the process. During the relationship, if B needs to travel for a month, then in the eyes of the young lady, this state of B is called a blocked or sleeping state. If it is normal, the boyfriend If you can be on call, then in the eyes of the young lady, this state is called the ready state, and the state where the young lady is in love with A is called the running state.

The younger sister and three younger brothers fall in love at the same time, and there will always be "strange dramas". For example, A needs to take the younger sister home to meet the parents, and asks the younger sister to prepare a gift, while B wants to take the younger sister to Sanya for a trip , I want the young lady to be ready, but one day A asked the young lady if she was ready? The young lady said that I bought two swimsuits, so it was a "play-by-play". In the end, the young lady finally fooled A. The young lady learned the lesson, and every time after that, she would repeat how far she went with someone last time. Is there any unfinished tasks recorded, and the next time you date, you can restore the previous state, so that the young lady will not play tricks in the future. This scene can be regarded as the context in the operating system up.

Miss Sister and ABC have been together for a while, so you need to think about it, that is, every once in a while, make a summary, count how much energy is spent on each person, and then make the next one based on this summary For the time arrangement of the stage, it is necessary to ensure that everyone always maintains a suitable scale, neither too far nor too close, which can be regarded as the accounting information in the process scheduling .

3.4 Memory Management

The memory address we obtain through the code program is actually not the real address on the hardware, but a virtual address integrated through some calculations.

Memory (physically a memory stick), which can store a lot of data, the memory can be imagined as a large corridor, the corridor is very long, there are many rooms, each room is 1Byte, and each room has a number, from Starting from 0 and accumulating sequentially, this memory number is the "address", and this address is also the "physical address".

The picture below is the real memory stick.

An amazing feature of memory is random access. Accessing data at any address in the memory is extremely fast and takes almost the same time. It is this feature that makes the time complexity of the array subscript operation O(1).

img

If our program directly accesses the physical address, it is fine if the program runs normally, but if there is a bug in the code, there may be out-of-bounds access, and some operations may be performed on the data of other processes, such as process 1. The pointer variable becomes 0x8000 because of the bug, that is to say, it is clearly a bug in process 1, but the process is broken.

img

In order to avoid this kind of problem, the memory space used by the process is "isolated", and a virtual address space is introduced .

The code no longer uses real physical addresses directly, but uses virtual addresses; the operating system and specialized hardware devices are responsible for converting virtual addresses to physical addresses.

img

The virtual space address introduced here hides and isolates the physical address to avoid mutual influence between processes. The MMU hardware device (many times integrated in the CPU) can make the conversion speed between virtual address and physical address faster.

When our operating system kernel finds that the converted address exceeds the access range of the process's physical address, it will directly feed back an error to the process, which will cause the process to crash without affecting other processes.

3.5 Communication between processes

The virtual space address introduced above isolates the processes from each other, which causes each process to only access the data in its own address space, and cannot access the data in the address space of other processes, so that when one process crashes, the other Processes will not be affected.

However, in the actual development work, sometimes data interaction between processes is required. In order to realize inter-process communication, the operating system provides a "public space" that can be accessed by multiple processes. Process A can first transfer the data Stored in the "public space", and then process B can go to the "public space" to take out the data, thus realizing the communication between processes.

The "public space/communication method" provided by the operating system has many specific forms, and now the most common inter-process communication mechanisms are:

  • File-based operations.
  • Based on network operation (socket).

3.6 Concurrent programming

The CPU has entered the era of multi-core. To further improve the execution speed of the program, it is necessary to make full use of the multi-core resources of the CPU. At this time, concurrent programming is applied; the main purpose of introducing the concept of process is to solve the " Concurrent programming" issues, so that the code program can make full use of these CPU cores.

In fact, multi-process programming can already utilize the multi-core resources of the cpu, which can solve the problem of concurrent programming, but there are still some problems. The creation of a process needs to allocate resources, and the destruction of a process needs to recycle resources, and the efficiency of resource allocation and recycling Relatively speaking, it is relatively low, that is to say, the process creation, destruction, and scheduling overhead are relatively large, and the cost is relatively high.

Problem solved :

  • plan 1:

Use the process pool, similar to the string constant pool, store the created process in the constant pool, and load the resources directly when you need to use it later, but there is also a problem, that is, idle processes will also occupy resources, which is equivalent to changing space Time, consume space to improve efficiency.

  • Scenario 2:

Use multithreading to achieve concurrent programming, because a process contains threads, and a process contains multiple threads, so threads are lighter than processes, so threads are also called "lightweight processes". The operation is saved, which leads to the creation, destruction, and scheduling of threads at a very low cost and faster speed, so it is more suitable to use multi-threading to achieve concurrent programming than multi-process.

Compare the following scenarios to understand multithreading :

We regard the process as a factory, and the thread as the production line in the factory. For example, if there is a task that needs to produce some hardware devices, if you want to produce faster, there are two solutions:

  1. Build two factories for production.
  2. Only build one factory, and add a batch of production lines.

The above two schemes produce equipment with the same efficiency, but scheme 1 needs to build an additional factory; scheme 2 only needs to add a batch of production lines and reuse the resources of the same factory (raw materials, transportation resources, etc.), while scheme 1 Compared with the cost of scheme 2, the price paid is greater. By analogy, scheme 1 means that multi-process realizes concurrent programming, and scheme 2 means that multi-thread realizes concurrent programming.

4. The difference between process and thread

  • A process contains threads, and a process contains one or more threads.
  • The processes are independent, each process has an independent address space, a process crash will not affect the rest of the process, but in the same process, multiple threads share a resource, a thread crashes, this process All threads of will crash.
  • A process is the basic unit for the operating system to allocate resources, and a thread is the basic unit for the operating system to schedule and execute.
  • Compared with multi-threading, multi-process does not have thread safety problems, and multi-threaded programming may have thread safety problems.

Guess you like

Origin blog.csdn.net/Trong_/article/details/128317791