Python crawler interview selection 02 episodes (process, thread experience)

Python process and thread interview experience summary

python learning directory portal

毕业季当下的你,还在为米那是烦恼吗?下面给大家带来Python进程线程的面试总结

Process, thread concept

  • Process and thread relationship
    Insert picture description here

  • 进程

    The one-time execution of the program in the computer.

    The program is an executable file, which holds the disk statically.

    A process is a dynamic process description, which occupies computer operating resources and has a certain life cycle.

  • The state of the process

    • Ready state: The process has the execution conditions and waits for system scheduling to allocate cpu resources

    • Running state: The process occupies the cpu and is running

    • Waiting state: The process is blocked and waiting, and the cpu will be released at this time

    • Five states (adding new and termination on the basis of three states)

        新建 : 创建一个进程,获取资源的过程
      
        终止 : 进程结束,释放资源的过程
      
  • 线程

    Threads are called lightweight processes, which are also multitasking programming methods

    You can also use the computer's multi-cpu resources

    Threads can be understood as branch tasks reopened in the process

  • Thread characteristics

    1. A process can contain multiple threads

    2. Thread is also a running behavior, which consumes computer resources

    3. All threads in a process share the resources of this process

    4. The operation of multiple threads also does not affect each other's operation

    5. The creation and destruction of threads consume much less resources than processes

  • Process is the smallest unit of resource allocation, thread is the smallest unit of CPU scheduling

The difference between threads and processes can be summarized as the following 4 points:

  • Address space and other resources (such as opening files):

    Between processes 相互独立, the same process 各线程间资源共享.

    Threads in a process are not visible in other processes.

  • communication:

    Inter-process communication IPC, threads can directly read and write process data segments (such as global variables) for communication-process synchronization and mutual exclusion means are needed to ensure data consistency.

  • Scheduling and switching:

    线程Context switching is more important than 进程context switching 快得多.

  • In a multithreaded OS, a process is not an executable entity.

Comparison of multi-process and multi-thread

Contrast dimension multi-Progress Multithreading to sum up
Data sharing and synchronization Complex data sharing and simple synchronization Data sharing is simple, synchronization is complicated Each has its own advantages and disadvantages
Memory, CPU Occupies memory , complex switching, and CPU utilization Occupies memory , simple switching, CPU utilization Thread dominance
Create, destroy, switch 复杂,速度慢 简单,速度快 Thread dominance
Programming, debugging Simple programming and debugging简单 Complex programming, debugging复杂 Process dominance
reliability Processes will not affect each other A thread hangs will cause the whole process to hang Process dominance
distributed Suitable for multi-core, multi-machine, easy to expand to multiple machines Suitable for multi-core Process dominance

进程与线程总结

  • The thread goes down in the process (a pure carriage cannot run)
  • A process can contain multiple threads (a train can have multiple carriages)
  • It is difficult to share data between different processes (passengers on one train are difficult to transfer to another train, such as station transfer)
  • It is easy to share data between different threads in the same process (carriage A is easy to change to carriage B)
  • Processes consume more computer resources than threads (multiple trains consume more resources than multiple carriages)
  • The processes will not affect each other. The hang of one thread will cause the whole process to hang (a train will not affect the other train, but if the middle car on a train is on fire, it will affect all the cars of the train. )
  • The process can be extended to multiple machines, and the process is suitable for multiple cores at most (different trains can be driven on multiple tracks, and carriages of the same train cannot be on different tracks)
  • The memory address used by the process can be locked, that is, when a thread uses some shared memory, other threads must wait for it to end before using this memory. (Such as a toilet on a train)-"mutex"
  • The memory address used by the process can limit the amount of usage (for example, a restaurant on a train, the maximum number of people allowed to enter, if it is full, you need to wait at the door, and wait for someone to come out before you can enter)-"semaphore"

Process programming

  • Use module: multiprocessing

  • Creation process

    1. Encapsulate events that need to be executed by the new process into functions

    2. Create process objects and associate functions through the Process class of the module

    3. Process information and attributes can be set through the process object

    4. Start the process through the process object call start

    5. Call join through the process object to reclaim process resources

"""
multiprocessing

1.将需要新进程执行的事件封装为函数

2 .通过模块的Process类创建进程对象,关联函数

3 .通过进程对象调用start启动进程

4 .通过进程对象调用join回收进程资源
"""
import multiprocessing as mp
from time import  sleep

a = 1

# 进程函数
def fun():
    global a
    print("开始一个进程")
    sleep(2)
    a = 1000
    print("a = ",a)
    print("进程结束了,实际也没干啥")

if __name__ == '__main__':

    # 创建进程对象
    p = mp.Process(target=fun) # 绑定函数 此时还没有创建进程

    # start启动进程 自动执行fun函数,作为一个进程执行
    p.start()  # 此时进程才产生

    print("原有进程也干点事")
    sleep(3)
    print("原有进程其实也没干啥")

    # 回收进程
    p.join()

    print("a :",a) # a = 1

Thread programming

  • Creation steps

    1. Inherit the Thread class

    2. Override the __init__method to add your own properties, and use super() to load the properties of the parent class

    3. Override the run() method

  • Instructions

    1. Instantiate object

    2. Call start to automatically execute the run method

    3. Call join to recycle the thread

"""
threading 创建线程演示
"""
from threading import Thread
from time import sleep
import os

a = 1  # 全局变量

# 线程函数
def music():
    for i in range(3):
        sleep(2)
        print(os.getpid(),"播放:黄河大合唱")

    global a
    print("a =",a)
    a = 1000

# 创建线程对象
t = Thread(target=music)
t.start() # 启动线程 执行music

for i in range(4):
    sleep(1)
    print(os.getpid(),"播放:葫芦娃")

t.join() # 回收线程资源

print("a:",a)"""
threading 创建线程演示
"""
from threading import Thread
from time import sleep
import os

a = 1  # 全局变量

# 线程函数
def music():
    for i in range(3):
        sleep(2)
        print(os.getpid(),"播放:黄河大合唱")

    global a
    print("a =",a)
    a = 1000

# 创建线程对象
t = Thread(target=music)
t.start() # 启动线程 执行music

for i in range(4):
    sleep(1)
    print(os.getpid(),"播放:葫芦娃")

t.join() # 回收线程资源

print("a:",a)

Zombies and Orphans

  • Orphan process: The parent process exits before the child process, and the child process becomes an orphan process at this time. [If you lose your parents, the father will withdraw first]

    • Features: The orphan process will be adopted by the system process . At this time, the system process will become the new parent process of the orphan process, and the orphan process exits the process will be processed automatically.
  • Zombie process: The child process exits before the parent process, and the parent process does not handle the exit status of the child process. At this time, the child process is called a zombie process.

    • Features: Although the zombie process ends, some process information resources will remain in the memory, and a large number of zombie processes will waste system memory resources.

    • How to avoid zombie processes

      1. Use join() to recycle
      2. Use the signal method in the parent process to deal with [once and forever, not available under window]
"""
孤儿进程和僵尸进程演示
"""
from multiprocessing import Process
from time import sleep
import os
from signal import *
def fun():
    print("这是一个子进程",os.getppid(),'---',os.getpid())
    sleep(3)
    print("注定成为孤儿进程", os.getppid(), '---', os.getpid())
if __name__ == '__main__':
    signal(SIGCHLD,SIG_IGN) # 系统方法处理僵尸进程,所有子进程退出由系统处理
    p = Process(target=fun)
    p.start()
    p.join() # 防止僵尸产生
    # 大量工作进入死循环
    while True:
        pass

Deadlock

  • Deadlock refers to a phenomenon in which two or more threads are blocked due to competition for resources or due to communication with each other during the execution. If there is no external force, they will not be able to advance. At this time, it is said that the system is in a deadlock state or the system has a deadlock. [Caused by logic confusion, multiple locks block each other, causing the program to fail to run]
  • Deadlock conditions

    • Mutual exclusion condition: refers to the thread using the mutual exclusion method, when a resource is used, other threads cannot use it. [The problem of the three parties owing money to each other will be solved logically]

    • Request and hold conditions: Refers to the thread that has held at least one resource, but has made a new resource request, and will not release the resources it holds before acquiring the new resources.

    • Non-deprivation conditions: no interference from outside the thread, such as the system forcibly terminating the thread.

    • Loop waiting condition: When a deadlock occurs, there must be a thread-a circular chain of resources, such as T0 is waiting for a resource occupied by T1; T1 is waiting for a resource occupied by T2, ..., Tn is waiting for T0 Resources occupied.

  • How to avoid deadlock

    • The logic is clear, and the four conditions caused by the above deadlock should not occur at the same time
    • Deadlock detection by test engineer

GIL global interpreter lock

  • What is the GIL problem (global interpreter lock) [Python thread is a tasteless, can't it 8?

    Because the interpreter lock is added to the design of the python interpreter , the python interpreter can only interpret and execute one thread at a time , which greatly reduces the execution efficiency of the thread.

  • This leads to consequences
    because when the thread encounters a blockage, the thread will actively give up the interpreter to explain other threads. Therefore, python multithreading can improve the efficiency of the program when performing multi-blocking tasks, and other situations cannot improve the efficiency.

    [Multi-threading improves the efficiency of multi-threading through sleep blocking]

    [Direct consequence: only one thread can be explained at the same time]

    [Efficiency: Low efficiency when processing non-blocking tasks]

    [Conclusion: Python threads are only suitable for processing tasks with high blocking latency efficiently]

  • GIL problem suggestions

* 尽量使用进程完成无阻塞的并发行为

* 不使用c作为解释器 (Java  C#)

 Guido的声明:<http://www.artima.com/forums/flat.jsp?forum=106&thread=214235>
  • in conclusion
    • The GIL problem has nothing to do with the Python language itself, it is a historical problem of interpreter design.
    • In the non-blocking state, the execution efficiency of multi-threaded programs is not high, or even not as efficient as single-threaded.
    • Python multithreading is only suitable for execution of tasks with blocking delays .

Guess you like

Origin blog.csdn.net/weixin_38640052/article/details/107781525