python(48): process, thread, coroutine

the difference

Process : owns code and open file resources, data resources, and independent memory space.

Thread: Thread is subordinate to the process and is the actual executor of the program. A process contains at least one main thread and can also have more child threads. Threads have their own stack space.

For the operating system, threads are the smallest execution unit, and processes are the smallest resource management units.

Process - the smallest unit of resource allocation, thread, the smallest unit of resource scheduling

Coroutines: Coroutines in English are a more lightweight existence than threads. Just as a process can have multiple threads, a thread can also have multiple coroutines.

The most important thing is that coroutines are not managed by the operating system kernel, but are completely controlled by the program (user/library) (that is, executed in user mode).

The benefit of this is that the performance has been greatly improved, and resources will not be consumed like thread switching.

Resource comparison

Process: has its own independent heap and stack. It neither shares the heap nor the stack. The process is scheduled by the operating system;

Threads: have their own independent stack and shared heap, shared heap, no shared stack, standard threads are scheduled by the operating system;

Coroutine: has its own independent stack and shared heap, shared heap, no shared stack. Coroutines are scheduled by programmers in the code of the coroutine.

 

process

Processes and Resources

When learning Linux, you often see two words: User space and Kernel space.

Simply put, Kernel space is the running space of the Linux kernel, and User space is the running space of user programs. For safety, they are isolated so that even if the user's program crashes, the kernel is not affected.

address space

The 4GB process virtual address space is divided into two parts: user space and kernel space

user space

The user space is divided into five different memory areas according to the principle that address spaces with consistent access attributes are stored together . Access properties refer to "readable, writable, executable, etc."

  • code snippet

The code segment is used to store the operation instructions of the executable file and the image of the executable program in the memory. The code segment needs to be protected from illegal modification at runtime, so only read operations are allowed, it is not writable.

  • data segment

The data segment is used to store initialized global variables in the executable file. In other words, it stores variables and global variables statically allocated by the program.

  • BSS segment

The BSS segment contains uninitialized global variables in the program, and the bss segment is all set to zero in memory.

  • heap

The heap is used to store memory segments that are dynamically allocated while the process is running. Its size is not fixed and can be dynamically expanded or reduced. When a process calls a function such as malloc to allocate memory, the newly allocated memory is dynamically added to the heap (the heap is expanded); when a function such as free is used to release memory, the freed memory is removed from the heap (the heap is reduced).

  • stack stack

The stack is where the user stores local variables temporarily created by the program, that is, variables defined in the function (but does not include variables declared statically, static means storing variables in the data segment). In addition, when a function is called, its parameters will also be pushed onto the stack of the process that initiated the call, and after the call is completed, the return value of the function will also be stored back on the stack. Due to the first-in-last-out feature of the stack, the stack is particularly convenient for saving/restoring the call scene. In this sense, we can think of the stack as a memory area that stores and exchanges temporary data.

In the above-mentioned memory areas, the data segment, BSS segment, and heap are usually stored continuously in memory and are contiguous in location, while the code segment and stack are often stored independently. In the i386 architecture, the stack and heap expand downwards and the heap expands upward, respectively.

You can also use the size command under Linux to check the size of each memory area of ​​the compiled program:

[lemon ~]# size /usr/local/sbin/sshd
   text    data     bss     dec     hex filename
1924532   12412  426896 2363840  2411c0 /usr/local/sbin/sshd

kernel space

In x86 32-bit systems, the Linux kernel address space refers to the high-end memory address space with virtual addresses starting from 0xC0000000 to 0xFFFFFFFF, with a total capacity of 1G, including kernel images, physical page tables, drivers, etc. running in the kernel space.

thread

A thread is the smallest unit that the operating system can perform calculation scheduling. Threads are included in processes and are the actual operating units in processes. A process can contain multiple threads, and threads are the smallest unit of resource scheduling.

Thread resources and overhead

Multiple threads in the same process share all system resources in the process, such as virtual address space, file descriptors, signal processing, etc. However, multiple threads in the same process have their own call stacks, register environments, thread local storage and other information.

The overhead of thread creation is mainly the establishment of thread stack and the overhead of allocating memory. These overheads are not large, and the largest overhead occurs when thread context switches.

Features:

The CPU time occupied by multi-thread concurrency is scheduled by the system, so thread safety needs to be considered during multi-thread programming.

coroutine 

By analogy, a process can have multiple threads, and a thread can also have multiple coroutines, so coroutines are also called micro-threads and fibers.

Python and Go provide good support for coroutines from the language level. 

Scheduling overhead

Threads are scheduled by the kernel. When a thread is scheduled to switch to another thread context, it is necessary to save the state of a user thread to memory, restore the state of another thread to a register, and then update the data structure of the scheduler. These steps are designed to operate Conversion from user mode to kernel mode requires a lot of overhead.

The scheduling of the coroutine is completely controlled by the user. The coroutine has its own register context and stack. When the coroutine schedule is switched, the register context and stack are saved to other places. When switching back, the previously saved register context and stack are restored. Directly operate the user space stack, without the overhead of kernel switching at all.

Several concepts of python coroutines:

event_loop : event loop, which is equivalent to an infinite loop. We can register some functions to this event loop. When the conditions are met, the corresponding processing method will be called.

coroutine : The Chinese translation is called coroutine, which is often referred to as the coroutine object type in Python. We can register the coroutine object into the time loop, and it will be called by the event loop. We can use  the async  keyword to define a method that will not be executed immediately when called, but will return a coroutine object.

task : Task, which is a further encapsulation of the coroutine object and contains the various states of the task.

future : represents the result of a task that will be executed or not executed in the future. In fact, there is no essential difference from task.

async keyword : async defines a coroutine;

await  keyword : used to suspend the execution of blocking methods.

Note: Code that does not support asynchronous modules cannot appear inside special functions. (Example: time, request)

Advantages and Disadvantages of Coroutines

advantage

1. Without the overhead of thread context switching, coroutines avoid meaningless scheduling, which can improve performance (but also because of this, programmers must bear the responsibility of scheduling themselves. At the same time, coroutines also lose the ability of standard threads to use multiple CPUs. Capability)
2. No overhead of atomic operation locking and synchronization required

shortcoming

1. Unable to take advantage of multi-core

2. Performing blocking operations (such as IO) will block the entire program.

Applicable scenarios: crawler programs, IO intensive

Crawler coroutine: Python crawler advanced | Asynchronous coroutine - PythonGirl - Blog Park

Comparison of multi-coroutine and multi-threading in crawler scenarios:

In a multi-threaded situation, when a request is initiated, this thread will wait for a response, and the system will switch to other threads for execution.

When using multiple coroutines, after one coroutine initiates a request, the code level switches to execute another coroutine, which can make full use of thread work and has no system context switching overhead.

question:

Why is a thread the smallest unit of operating system scheduling?
A thread is a single sequential control flow in a process. Multiple threads can run concurrently in a process, and each thread performs different tasks in parallel.

reference

Dry information | 10 pictures of processes, threads, and coroutines explain it clearly! - Know almost

Detailed explanation of threads, processes, and coroutines_Youyuyuyou's blog-CSDN blog

python coroutine and asynchronous IO - Liu Jiang's python tutorial

https://www.jianshu.com/p/7c851145ee4c

Understanding Python's Coroutine - Nuggets

Guess you like

Origin blog.csdn.net/qq_37674086/article/details/126433311