python GIL and data structures and multi-process

And a data structure GIL

1 queue

Queue standard library modules, providing the queue FIFO, LIFO queue, priority queue
Queue class is thread-safe, suitable for multi-thread between the secure exchange of data, internal use of Lock and Condition


Why not accurate size of the container, if the reason is not locked, it is impossible to get an accurate size, just because you read a size, not removed, there may be other threads to be modified, Although the queue class size plus the lock, but still can not guarantee get immediately, put can be successful, because the read size and get, put points to the method is.

2 GIL

1 Introduction

Global Interpreter Lock, lock process level GIL
CPython have a lock in the interpreter process, called GIL global interpreter lock.

GIL ensure Cpython process, the current time only one thread to execute code, even in the case of multi-core, too.

2 IO-intensive and CPU-intensive

Cpython in
IO-intensive, because the thread is blocked, it will schedule another thread
CPU intensive, continuous current thread may get GIL, lead to other thread is almost impossible to use the CPU, to wake up other threads, you need to prepare the data, the cost is high .


IO-intensive, multi-threaded resolved, CPU-intensive, multi-process to solve, bypassing the GIL.

The vast majority of read and write operations python built-in data structures are atomic operations


Because of the GIL, python's built-in data types become safe in multithreaded programming, but in fact they themselves are not thread-safe type

3 reasons to retain GIL

Guido adhere to a simple philosophy for beginners low threshold, the system does not require advanced knowledge can be safe, simple to use python.
And removing the GIL. It becomes very inefficient Cpython single-threaded.

4 to verify whether it is single-threaded

Relevant examples

import  logging
import datetime
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc():
    sum=0
    for _ in range(1000000000):
        sum+=1

calc()
calc()
calc()
calc()
calc()
delta=(datetime.datetime.now()-start).total_seconds()
logging.info(delta)

python GIL and data structures and multi-process

The results in multi-threaded mode

import  logging
import datetime
import threading
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc():
    sum=0
    for _ in range(1000000000):
        sum+=1
lst=[]
for _ in range(5):
    t=threading.Thread(target=calc)
    t.start()
    lst.append(t)

for t in lst:
    t.join()

delta=(datetime.datetime.now()-start).total_seconds()

print (delta)

The results are as follows

python GIL and data structures and multi-process

From this point of view two programs, Cpython no advantage in multi-threaded, a thread of execution time and quite, because there GIL

More than two processes

1 concept

More than one process description

As the best choice in python GIL, not CPU-intensive multi-threaded programs

Multiple processes can run completely separate process procedures, you can take full advantage of multi-processor

But the process itself brings isolation of data sharing is not an issue, and thread lightweight process and more than

Multi-process is a means to solve the concurrent

2 similarities and differences between processes and threads

Same point:

The process is terminated, the thread is not terminated by order, terminate the thread either thrown or program execution is complete in itself.

Inter-process synchronization and thread synchronization provides the same class, is the same use, the use of the effect is similar, however, the cost of inter-process synchronization than thread, but different underlying implementation.

also provides a shared memory multiprocessing, shared data to the server process, also provides a queue queues, the match between pipes for interprocess communication


difference

Different communication ways
more than one process is to enable multiple interpreters processes, inter-process communication must be serialized, deserialized
security issues 2 data

Multi-process is best performed in the main
multi-threaded data has been processed, it does not need to be serialized again the

Multi-process transfer must serialization and de-serialization.

3 Application Process

Remote call, RPC, across the network

2 Parameter Description

The process based multiprocessing

process class followed the Thread class API, reducing the learning curve
of different processes can be completely scheduled on different CPU execution

IO-intensive best to use multi-threading
CPU-intensive best to use a multi-process

Process Relevant Properties

name meaning
pid Process ID
exitcode Process exit status codes
terminate() Terminate the specified process

3 Example

import  logging
import datetime
import multiprocessing
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc(i):
    sum=0
    for _ in range(1000000000):
        sum+=1
lst=[]
for  i in range(5):
    p=multiprocessing.Process(target=calc,args=(i,),name="P-{}".format(i))
    p.start()
    lst.append(p)
for p in  lst:
    p.join()

delta=(datetime.datetime.now()-start).total_seconds()
print (delta)

The results are as follows

python GIL and data structures and multi-process

Multi-process itself between the process and avoid the time needed to process scheduling, multi-core use, and CPU scheduling problem exists here
multiple processes on the CPU upgrade is obvious.
Single-threaded, multi-threaded ran for a long time, but multiple processes with just a minute and a half, is a real parallel

4 related process pool

import  logging
import datetime
import multiprocessing
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc(i):
    sum=0
    for _ in range(1000000000):
        sum+=1
    print (i,sum)
if  __name__=='__main__':
    start=datetime.datetime.now()
    p=multiprocessing.Pool(5)  # 此处用于初始化进程池,其池中的资源是可以复用的
    for i in range(5):
        p.apply_async(calc,args=(i,))
    p.close()  # 下面要执行join,上面必须先close
    p.join()
    delta=(datetime.datetime.now()-start).total_seconds()
    print (delta)

The results are as follows

python GIL and data structures and multi-process

And more use of process pool creation process is treated is also a good approach

5 Multi-process and multi-threading options

1 Select

1 CPU-intensive
Cpython used in GIL, competing with each other when multi-threaded, multi-core advantages and can not play, python using a higher and more efficient process

2 IO-intensive

Suitable for multi-threaded, to reduce the IO serialization overhead, and in the IO wait, switch to another thread to continue, good efficiency, of course, also applies to multi-process IO-intensive

2 Application

Request / response model: Common WEB application processing model

master launch multiple worker processes work, and the number of CPU generally the same
worker in the work process to start multiple threads, improve concurrent processing capability, worker handle the user's request, often need to wait for data
which is the operating mode of nginx

Work processes are generally the same and the number of CPU core, pro immunogenicity of the CPU, the CPU process migration costs are relatively high.

Three concurrent package

1 concept

concurrent.futures
3.2 version introduces modular
asynchronous parallel programming task module provides a high level of convenience asynchronous interface to perform

It offers two pools actuators

ThreadPoolExecutor asynchronous call thread pool Executor
ProcessPoolExecutor process asynchronous call pool Executor

2 Parameter Description

method meaning
ThreadPoolExecutor(max_workers=1) Max_workers pool up to create a pool of threads to simultaneously execute asynchronously, return Executor instance
submit(fn,*args,**kwagrs) Submit function parameters and execution returns Future examples
shutdown(wait=True) Cleanup pool

Future 类

method meaning
result() You can view the return call
done() If the call is canceled or executed successfully completed, return to True
cancelled() If the call is successfully canceled, returns True
running() If you are running and can not be canceled, True is returned
cancel() Attempt to cancel the call has been executed and can not be canceled if it returns False, otherwise True
result(timeout=None) The results were returned, timeout is None, has been waiting to return, to set the timeout expires, an exception is thrown concurrent.futures.TimeoutError
execption(timeout=None) Take exception to be returned, timeout None, has been waiting to return, to set the timeout expires, an exception is thrown concurrent.futures.TimeoutError

3 Examples relevant thread pool

import  logging
import threading
from   concurrent  import  futures
import logging
import  time

logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")

def worker(n):  # 定义未来执行的任务
    logging.info("begin to work{}".format(n))
    time.sleep(5)
    logging.info("finished{}".format(n))
# 创建一个线程池,池容量为3
executor=futures.ThreadPoolExecutor(max_workers=3)

fs=[]
for i in range(3):
    f=executor.submit(worker,i)  # 传入参数,返回Future对象
    fs.append(f)

for  i in range(3,6):
    f=executor.submit(worker,i)  # 传入参数,返回Future对象
    fs.append(f)
while True:
    time.sleep(2)
    logging.info(threading.enumerate())  #返回存活线程列表
    flag=True
    for  f  in fs:
        logging.info(f.done()) # 如果被成功调用或取消完成,此处返回为True
        flag=flag  and f.done()  # 若都调用成功,则返回为True,否则则返回为False
    if flag:
        executor.shutdown()  # 如果全部调用成功,则需要清理池
        logging.info(threading.enumerate())
        break

The results are as follows

python GIL and data structures and multi-process

Its thread pool thread is continuous use, once created good thread, it does not change, the only bad thing is the thread name has not changed, but the most affected printing

Examples of relevant process pool 4

import  logging
import threading
from   concurrent  import  futures
import logging
import  time

logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")

def worker(n):  # 定义未来执行的任务
    logging.info("begin to work{}".format(n))
    time.sleep(5)
    logging.info("finished{}".format(n))
# 创建一个进程池,池容量为3
executor=futures.ProcessPoolExecutor(max_workers=3)

fs=[]
for i in range(3):
    f=executor.submit(worker,i)  # 传入参数,返回Future对象
    fs.append(f)

for  i in range(3,6):
    f=executor.submit(worker,i)  # 传入参数,返回Future对象
    fs.append(f)
while True:
    time.sleep(2)
    flag=True
    for  f  in fs:
        logging.info(f.done()) # 如果被成功调用或取消完成,此处返回为True
        flag=flag  and f.done()  # 若都调用成功,则返回为True,否则则返回为False
    if flag:
        executor.shutdown()  # 如果全部调用成功,则需要清理池
        break

The results are as follows

python GIL and data structures and multi-process

5 supports context management

concurrent.futures.ProcessPoolExecutor inherited from concurrent.futures.base.Executor, and parent have enter, _exit method, which is to support the context management, you can use the with statement

import  logging
import threading
from   concurrent  import  futures
import logging
import  time

logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")

def worker(n):  # 定义未来执行的任务
    logging.info("begin to work{}".format(n))
    time.sleep(5)
    logging.info("finished{}".format(n))
fs=[]
with   futures.ProcessPoolExecutor(max_workers=3) as executor:
    for  i in range(6):
        futures=executor.submit(worker,i)
        fs.append(futures)
while True:
    time.sleep(2)
    flag=True
    for  f  in fs:
        logging.info(f.done()) # 如果被成功调用或取消完成,此处返回为True
        flag=flag  and f.done()  # 若都调用成功,则返回为True,否则则返回为False
    if flag:
        executor.shutdown()  # 如果全部调用成功,则需要清理池
        break

The results are as follows

python GIL and data structures and multi-process

Summary 6

Unified the calling thread pool, pool process, simplifies programming, python is to provide a simple philosophy of thinking now
only drawback: You can not set the thread name

Guess you like

Origin blog.51cto.com/11233559/2432516