And a data structure GIL
1 queue
Queue standard library modules, providing the queue FIFO, LIFO queue, priority queue
Queue class is thread-safe, suitable for multi-thread between the secure exchange of data, internal use of Lock and Condition
Why not accurate size of the container, if the reason is not locked, it is impossible to get an accurate size, just because you read a size, not removed, there may be other threads to be modified, Although the queue class size plus the lock, but still can not guarantee get immediately, put can be successful, because the read size and get, put points to the method is.
2 GIL
1 Introduction
Global Interpreter Lock, lock process level GIL
CPython have a lock in the interpreter process, called GIL global interpreter lock.GIL ensure Cpython process, the current time only one thread to execute code, even in the case of multi-core, too.
2 IO-intensive and CPU-intensive
Cpython in
IO-intensive, because the thread is blocked, it will schedule another thread
CPU intensive, continuous current thread may get GIL, lead to other thread is almost impossible to use the CPU, to wake up other threads, you need to prepare the data, the cost is high .
IO-intensive, multi-threaded resolved, CPU-intensive, multi-process to solve, bypassing the GIL.
The vast majority of read and write operations python built-in data structures are atomic operations
Because of the GIL, python's built-in data types become safe in multithreaded programming, but in fact they themselves are not thread-safe type
3 reasons to retain GIL
Guido adhere to a simple philosophy for beginners low threshold, the system does not require advanced knowledge can be safe, simple to use python.
And removing the GIL. It becomes very inefficient Cpython single-threaded.
4 to verify whether it is single-threaded
Relevant examples
import logging
import datetime
logging.basicConfig(level=logging.INFO,format="%(asctime)s %(threadName)s %(message)s ")
start=datetime.datetime.now()
def calc():
sum=0
for _ in range(1000000000):
sum+=1
calc()
calc()
calc()
calc()
calc()
delta=(datetime.datetime.now()-start).total_seconds()
logging.info(delta)
The results in multi-threaded mode
import logging
import datetime
import threading
logging.basicConfig(level=logging.INFO,format="%(asctime)s %(threadName)s %(message)s ")
start=datetime.datetime.now()
def calc():
sum=0
for _ in range(1000000000):
sum+=1
lst=[]
for _ in range(5):
t=threading.Thread(target=calc)
t.start()
lst.append(t)
for t in lst:
t.join()
delta=(datetime.datetime.now()-start).total_seconds()
print (delta)
The results are as follows
From this point of view two programs, Cpython no advantage in multi-threaded, a thread of execution time and quite, because there GIL
More than two processes
1 concept
More than one process description
As the best choice in python GIL, not CPU-intensive multi-threaded programs
Multiple processes can run completely separate process procedures, you can take full advantage of multi-processor
But the process itself brings isolation of data sharing is not an issue, and thread lightweight process and more than
Multi-process is a means to solve the concurrent
2 similarities and differences between processes and threads
Same point:
The process is terminated, the thread is not terminated by order, terminate the thread either thrown or program execution is complete in itself.
Inter-process synchronization and thread synchronization provides the same class, is the same use, the use of the effect is similar, however, the cost of inter-process synchronization than thread, but different underlying implementation.
also provides a shared memory multiprocessing, shared data to the server process, also provides a queue queues, the match between pipes for interprocess communication
difference
Different communication ways
more than one process is to enable multiple interpreters processes, inter-process communication must be serialized, deserialized
security issues 2 dataMulti-process is best performed in the main
multi-threaded data has been processed, it does not need to be serialized again theMulti-process transfer must serialization and de-serialization.
3 Application Process
Remote call, RPC, across the network
2 Parameter Description
The process based multiprocessing
process class followed the Thread class API, reducing the learning curve
of different processes can be completely scheduled on different CPU executionIO-intensive best to use multi-threading
CPU-intensive best to use a multi-processProcess Relevant Properties
name | meaning |
---|---|
pid | Process ID |
exitcode | Process exit status codes |
terminate() | Terminate the specified process |
3 Example
import logging
import datetime
import multiprocessing
logging.basicConfig(level=logging.INFO,format="%(asctime)s %(threadName)s %(message)s ")
start=datetime.datetime.now()
def calc(i):
sum=0
for _ in range(1000000000):
sum+=1
lst=[]
for i in range(5):
p=multiprocessing.Process(target=calc,args=(i,),name="P-{}".format(i))
p.start()
lst.append(p)
for p in lst:
p.join()
delta=(datetime.datetime.now()-start).total_seconds()
print (delta)
The results are as follows
Multi-process itself between the process and avoid the time needed to process scheduling, multi-core use, and CPU scheduling problem exists here
multiple processes on the CPU upgrade is obvious.
Single-threaded, multi-threaded ran for a long time, but multiple processes with just a minute and a half, is a real parallel
4 related process pool
import logging
import datetime
import multiprocessing
logging.basicConfig(level=logging.INFO,format="%(asctime)s %(threadName)s %(message)s ")
start=datetime.datetime.now()
def calc(i):
sum=0
for _ in range(1000000000):
sum+=1
print (i,sum)
if __name__=='__main__':
start=datetime.datetime.now()
p=multiprocessing.Pool(5) # 此处用于初始化进程池,其池中的资源是可以复用的
for i in range(5):
p.apply_async(calc,args=(i,))
p.close() # 下面要执行join,上面必须先close
p.join()
delta=(datetime.datetime.now()-start).total_seconds()
print (delta)
The results are as follows
And more use of process pool creation process is treated is also a good approach
5 Multi-process and multi-threading options
1 Select
1 CPU-intensive
Cpython used in GIL, competing with each other when multi-threaded, multi-core advantages and can not play, python using a higher and more efficient process2 IO-intensive
Suitable for multi-threaded, to reduce the IO serialization overhead, and in the IO wait, switch to another thread to continue, good efficiency, of course, also applies to multi-process IO-intensive
2 Application
Request / response model: Common WEB application processing model
master launch multiple worker processes work, and the number of CPU generally the same
worker in the work process to start multiple threads, improve concurrent processing capability, worker handle the user's request, often need to wait for data
which is the operating mode of nginxWork processes are generally the same and the number of CPU core, pro immunogenicity of the CPU, the CPU process migration costs are relatively high.
Three concurrent package
1 concept
concurrent.futures
3.2 version introduces modular
asynchronous parallel programming task module provides a high level of convenience asynchronous interface to performIt offers two pools actuators
ThreadPoolExecutor asynchronous call thread pool Executor
ProcessPoolExecutor process asynchronous call pool Executor
2 Parameter Description
method | meaning |
---|---|
ThreadPoolExecutor(max_workers=1) | Max_workers pool up to create a pool of threads to simultaneously execute asynchronously, return Executor instance |
submit(fn,*args,**kwagrs) | Submit function parameters and execution returns Future examples |
shutdown(wait=True) | Cleanup pool |
Future 类
method | meaning |
---|---|
result() | You can view the return call |
done() | If the call is canceled or executed successfully completed, return to True |
cancelled() | If the call is successfully canceled, returns True |
running() | If you are running and can not be canceled, True is returned |
cancel() | Attempt to cancel the call has been executed and can not be canceled if it returns False, otherwise True |
result(timeout=None) | The results were returned, timeout is None, has been waiting to return, to set the timeout expires, an exception is thrown concurrent.futures.TimeoutError |
execption(timeout=None) | Take exception to be returned, timeout None, has been waiting to return, to set the timeout expires, an exception is thrown concurrent.futures.TimeoutError |
3 Examples relevant thread pool
import logging
import threading
from concurrent import futures
import logging
import time
logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")
def worker(n): # 定义未来执行的任务
logging.info("begin to work{}".format(n))
time.sleep(5)
logging.info("finished{}".format(n))
# 创建一个线程池,池容量为3
executor=futures.ThreadPoolExecutor(max_workers=3)
fs=[]
for i in range(3):
f=executor.submit(worker,i) # 传入参数,返回Future对象
fs.append(f)
for i in range(3,6):
f=executor.submit(worker,i) # 传入参数,返回Future对象
fs.append(f)
while True:
time.sleep(2)
logging.info(threading.enumerate()) #返回存活线程列表
flag=True
for f in fs:
logging.info(f.done()) # 如果被成功调用或取消完成,此处返回为True
flag=flag and f.done() # 若都调用成功,则返回为True,否则则返回为False
if flag:
executor.shutdown() # 如果全部调用成功,则需要清理池
logging.info(threading.enumerate())
break
The results are as follows
Its thread pool thread is continuous use, once created good thread, it does not change, the only bad thing is the thread name has not changed, but the most affected printing
Examples of relevant process pool 4
import logging
import threading
from concurrent import futures
import logging
import time
logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")
def worker(n): # 定义未来执行的任务
logging.info("begin to work{}".format(n))
time.sleep(5)
logging.info("finished{}".format(n))
# 创建一个进程池,池容量为3
executor=futures.ProcessPoolExecutor(max_workers=3)
fs=[]
for i in range(3):
f=executor.submit(worker,i) # 传入参数,返回Future对象
fs.append(f)
for i in range(3,6):
f=executor.submit(worker,i) # 传入参数,返回Future对象
fs.append(f)
while True:
time.sleep(2)
flag=True
for f in fs:
logging.info(f.done()) # 如果被成功调用或取消完成,此处返回为True
flag=flag and f.done() # 若都调用成功,则返回为True,否则则返回为False
if flag:
executor.shutdown() # 如果全部调用成功,则需要清理池
break
The results are as follows
5 supports context management
concurrent.futures.ProcessPoolExecutor inherited from concurrent.futures.base.Executor, and parent have enter, _exit method, which is to support the context management, you can use the with statement
import logging
import threading
from concurrent import futures
import logging
import time
logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")
def worker(n): # 定义未来执行的任务
logging.info("begin to work{}".format(n))
time.sleep(5)
logging.info("finished{}".format(n))
fs=[]
with futures.ProcessPoolExecutor(max_workers=3) as executor:
for i in range(6):
futures=executor.submit(worker,i)
fs.append(futures)
while True:
time.sleep(2)
flag=True
for f in fs:
logging.info(f.done()) # 如果被成功调用或取消完成,此处返回为True
flag=flag and f.done() # 若都调用成功,则返回为True,否则则返回为False
if flag:
executor.shutdown() # 如果全部调用成功,则需要清理池
break
The results are as follows
Summary 6
Unified the calling thread pool, pool process, simplifies programming, python is to provide a simple philosophy of thinking now
only drawback: You can not set the thread name