Million annual salary python road - concurrent programming as much as two threads

1. deadlock with recursive lock

There deadlock with recursive process lock, the process deadlocks and thread deadlocks and lock recursion recursive lock empathy.

The so-called deadlock: refers to the phenomenon of two or more processes or threads in the implementation process, because competition for resources has led to a wait of one another, in the absence of external forces under, they will be unable to advance down this time. He said the system is in deadlock state or system to produce a deadlock, which is always in the process of waiting for each other is called the process deadlock

img

# 多个线程多个锁可能会产生死锁
from threading import Thread
from threading import Lock
import time

lock_A = Lock()
lock_B = Lock()

class MyThread(Thread):

    def run(self):
        self.f1()
        self.f2()
    def f1(self):
        lock_A.acquire()
        print(f'{self.name}拿到A锁')

        lock_B.acquire()
        print(f'{self.name}拿到B锁')
        lock_B.release()

        lock_A.release()
    def f2(self):
        lock_B.acquire()
        print(f'{self.name}拿到B锁')
        time.sleep(0.1)

        lock_A.acquire()
        print(f'{self.name}拿到A锁')
        lock_A.release()

        lock_B.release()

if __name__ == '__main__':
    for i in range(3):
        t = MyThread()
        t.start()

    print('主....')

Solution: recursive locks in Python in order to support the same thread multiple requests for the same resource, python provides a repeatable lock RLock, this internal RLock maintains a Lock and a counter variable, counter records the number of times acquire, so that resources can be repeatedly acquire. Acquire a thread until all have been release, other threads to get resources. The above example if instead of using RLock Lock, the deadlock will not occur:

from threading import Thread
from threading import RLock
import time
lock_A = lock_B = RLock()   # 递归锁
# lock_A = RLock()  # 这两行还是会死锁,原因不明 
# lock_B = RLock()  # 这两行还是会死锁,原因不明 
# 递归锁有一个计数的功能, 原数字为0,上一次锁,计数+1,释放一次锁,计数-1,
# 只要递归锁上面的数字不为零,其他线程就不能抢锁.
class MyThread(Thread):
    def run(self):
        self.f1()
        self.f2()
        
    def f1(self):
        lock_A.acquire()
        print(f'{self.name}拿到了A锁')

        lock_B.acquire()
        print(f'{self.name}拿到了B锁')

        lock_B.release()

        lock_A.release()

    def f2(self):
        lock_B.acquire()
        print(f'{self.name}拿到了B锁')
        time.sleep(0.1)
        lock_A.acquire()
        print(f'{self.name}拿到了A锁')
        lock_A.release()

        lock_B.release()

if __name__ == '__main__':
    for i in range(3):
        t = MyThread()
        t.start()

Recursive lock can solve deadlock, tasks require multiple locks, consider recursive lock

2. Semaphore

Is also a lock, control the number of concurrent

Like the process of

Semaphore management of a built-in counter,
whenever the call to acquire () built-in counter 1;
built-in counter +1 calling Release ();
counter is not less than 0; when the counter is 0, acquire () will block until another thread invokes the thread release ().

Examples :( while only five threads can get semaphore, which can limit the maximum number of connections to 5):

from threading import Thread,Semaphore,currentThread
import time
import random
sem = Semaphore(5)

def task():
    sem.acquire()
    print(f"{currentThread().name} WCing.....")
    time.sleep(random.randint(1,3))
    sem.release()

if __name__ == '__main__':
    for i in range(20):
        t = Thread(target=task)
        t.start()

Process pool and semaphores difference:

进程池和信号量的区别:

  进程池是多个需要被执行的任务在进程池外面排队等待获取进程对象去执行自己,而信号量是一堆进程等待着去执行一段逻辑代码。

  信号量不能控制创建多少个进程,但是可以控制同时多少个进程能够执行,但是进程池能控制你可以创建多少个进程。

    信号量:一次只允许固定的进程进行操作,进程的内存空间和创建时间都没减少,只减轻了操作系统的压力
    进程池: 最多开启多少进程,节省内存空间和创建时间

  举例:就像那些开大车拉煤的,信号量是什么呢,就好比我只有五个车道,你每次只能过5辆车,但是不影响你创建100辆车,但是进程池相当于什么呢?相当于你只有5辆车,每次5个车拉东西,拉完你再把车放回来,给别的人拉煤用。

  其他语言里面有更高级的进程池,在设置的时候,可以将进程池中的进程动态的创建出来,当需求增大的时候,就会自动在进程池中添加进程,需求小的时候,自动减少进程,并且可以设置进程数量的上线,最多为多,python里面没有

3. GIL Global Interpreter Lock

A lot of self-proclaimed Great God says, GIL lock is the fatal flaw python, Python can not multi-core, not concurrent and so on .....

First, some languages ​​(java, c ++, c) to support multiple threads within the same process that can be applied to multi-core CPU, which is what we hear now regressed the 4-core 8-core multi-core CPU technology at this . So when we said before multi-process applications to share data if the data is not unsafe behavior occurs ah, that is, multiple processes at the same time to grab a file of this data, we regard this data is changed, but not enough time to update to the original document, it was also calculated by other processes, leading to the problem of insecurity data ah, so we can not solve by locking ah, multi-threaded we think about is not the same, there is the concurrent execution problem. But the earliest time for python multithreading also locked, but more extreme python (at that time really only one computer cpu core) plus a GIL Global Interpreter Lock, is the interpreter level, lock the whole thread, not thread inside some data operations, every time there is a thread can only use cpu, said it does not take multi-threaded multi-core, but he is not a python problem of language is characteristic CPython interpreter, if the interpreter is not this Jpython problem, Cpython is the default, because the fast, Jpython is java development, in Cpython there is no way to use multi-core, this is a python ills, historical issues, although many python team great God working to change this situation, but temporarily unresolved. (This is an interpreted language (python, php) and a compiled language it matter ??? TBD!, Compiled language is generally during compilation will help you better allocate, interpretive execution side to side to explain, so in order to prevent unsafe data plus the emergence of the lock, which is all the drawbacks of interpreted languages ​​??)

First look, execution of a py file:

img

Single python process theoretically should be able to use multi-core, but CPython interpreter during the initial development, set GIL lock, single-process the same time only one thread can enter CPython compiler.

Why lock?

  1. At that time are single-core era, and cpu price is very expensive.

  2. If not global interpreter lock, develop Cpython interpreter inside source programmers will be in a variety of active lock, unlock, very troublesome, all kinds of deadlocks, etc. In order to save his thing, go directly to the interpreter plus a thread lock.

    Pros: to ensure the security of data resources Cpython interpreter.

    Disadvantages: a single multi-threaded process can not take advantage of multiple cores.

Jython no GIL lock.

GIL pypy did not lock.

Now multi-core era, I would Cpython remove the GIL lock okay?

Because Cpython interpreter all the business logic is implemented around a single thread, remove the GIL lock, almost impossible.

img

img

Multithreaded single process can be complicated, but it can not take advantage of multi-core, not parallel but different process threads can enter more than one CPU at a time

But with this we can not lock concurrency yet? When our program is a partial calculation, that is, high cpu utilization program (cpu has been calculated), to die, but if your program is I / O type (usually this is your program) ( input, access the web site network delay, to open / close the file read and write), high concurrency in what circumstances it (financial calculations will use artificial intelligence (alpha dog), but less than the general business scenario, web crawl , multi-user sites, chat software, processing files), I / O type of operations rarely take up cPU, so many concurrent threads can still, just as fast cpu scheduling thread, and the thread and there is no calculation, like a heap of network requests, I am a very fast cpu will be a multi-threaded scheduling you go out, go to your thread performs I / O operations, and

IO-intensive: for use

Compute-intensive: not suitable for use

Detailed description of the GIL lock: https://www.cnblogs.com/jin-xin/articles/11232225.html

4. GIL difference lock and lock

The same point: it is mutex

difference:

GIL Global Interpreter Lock, secure resource data inside the interpreter.

GIL lock and release the lock without manual operation.

Mutex own code to protect the security of data resources in the process.

Their definition of mutex have to manually lock and release the lock.

5. Verify that the efficiency of intensive computationally intensive IO

Code validation:

Concurrent multi-threaded VS multiple processes of a single process: compute-intensive

##  计算密集型: 单个进程的多线程并发 VS 多个进程的并发
from threading import Thread
from multiprocessing import Process
import time
import random
def task():
    count = 0
    for i in range(10000000):
        count += 1
        
if __name__ == '__main__':
    # 多进程的并发
    start_time = time.time()
    l = []
    for i in range(4):
        p = Process(target=task)
        l.append(p)
        t.start()
    for i in l:
        i.join()

    print(f"执行时间:{time.time() - start_time}")
    # 执行时间:1.6398890018463135

    # # 多线程的并发
    # start_time = time.time()
    # l = []
    # for i in range(4):
    #     t = Thread(target=task)
    #     l.append(t)
    #     t.start()
    #
    # for i in l:
    #     i.join()
    #
    # print(f"执行时间:{time.time() - start_time}")
    # # 执行时间:2.6619932651519775
    
# 结论: 计算密集型: 多进程的并发并行效率高.

IO-intensive: VS concurrent multiple processes concurrently by multiple threads of a single process of parallel

# IO密集型: 单个进程的多个线程并发 VS 多个进程的并发并行
from threading import Thread
from multiprocessing import Process
import time
import random
def task():
    count = 0
    time.sleep(random.randint(1,3))
    count += 1

if __name__ == '__main__':
    # 多进程的并发并行
    # start_time = time.time()
    # l = []
    # for i in range(50):
    #     p = Process(target=task)
    #     l.append(p)
    #     p.start()
    #
    # for i in l:
    #     i.join()
    #
    # print(f"执行效率:{time.time() - start_time}")
    # 执行效率:8.976764440536499


    # 多线程的并发并行
    start_time = time.time()
    l = []
    for i in range(50):
        t = Thread(target=task)
        l.append(t)
        t.start()
    for i in l:
        i.join()
        
    print(f"执行效率:{time.time() - start_time}")
    # 执行效率:3.0085208415985107
    
    
# 对于IO密集型: 单个进程的多线程的并发效率高.

6. multithread socket communication

import socket
from threading import Thread

def communicate(conn,addr):
    while 1:
        try:
            from_client_data = conn.recv(1024)
            print(f'来自客户端{addr[1]}的消息: {from_client_data.decode("utf-8")}')
            to_client_data = input('>>>').strip()
            conn.send(to_client_data.encode('utf-8'))
        except Exception:
            break
    conn.close()



def _accept():
    server = socket.socket()

    server.bind(('127.0.0.1', 8848))

    server.listen(5)

    while 1:
        conn, addr = server.accept()
        t = Thread(target=communicate,args=(conn,addr))
        t.start()

if __name__ == '__main__':
    _accept()
# client
import socket

client = socket.socket()

client.connect(('127.0.0.1',8848))

while 1:
    try:
        to_server_data = input('>>>').strip()
        client.send(to_server_data.encode('utf-8'))

        from_server_data = client.recv(1024)
        print(f'来自服务端的消息: {from_server_data.decode("utf-8")}')

    except Exception:
        break
client.close()

7. The process pool, thread pool

Python standard modules - concurrent.futures

Like using threadPollExecutor and ProcessPollExecutor way, and as long as you can through this concurrent.futures import directly with two of them.

concurrent.futures模块提供了高度封装的异步调用接口
ThreadPoolExecutor:线程池,提供异步调用
ProcessPoolExecutor: 进程池,提供异步调用
Both implement the same interface, which is defined by the abstract Executor class.

#2 基本方法
#submit(fn, *args, **kwargs)
异步提交任务

#map(func, *iterables, timeout=None, chunksize=1) 
取代for循环submit的操作

#shutdown(wait=True) 
相当于进程池的pool.close()+pool.join()操作
wait=True,等待池内所有任务执行完毕回收完资源后才继续
wait=False,立即返回,并不会等待池内的任务执行完毕
但不管wait参数为何值,整个程序都会等到所有任务执行完毕
submit和map必须在shutdown之前

#result(timeout=None)
取得结果

#add_done_callback(fn)
回调函数

ThreadPoolExecutor simple to use:

import time
import os
import threading
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

def func(n):
    time.sleep(2)
    print('%s打印的:'%(threading.get_ident()),n)
    return n*n
tpool = ThreadPoolExecutor(max_workers=5) #默认一般起线程的数据不超过CPU个数*5
# tpool = ProcessPoolExecutor(max_workers=5) #进程池的使用只需要将上面的ThreadPoolExecutor改为ProcessPoolExecutor就行了,其他都不用改
#异步执行
t_lst = []
for i in range(5):
    t = tpool.submit(func,i) #提交执行函数,返回一个结果对象,i作为任务函数的参数 def submit(self, fn, *args, **kwargs):  可以传任意形式的参数
    t_lst.append(t)  #
    # print(t.result())
    #这个返回的结果对象t,不能直接去拿结果,不然又变成串行了,可以理解为拿到一个号码,等所有线程的结果都出来之后,我们再去通过结果对象t获取结果
tpool.shutdown() #起到原来的close阻止新任务进来 + join的作用,等待所有的线程执行完毕
print('主线程')
for ti in t_lst:
    print('>>>>',ti.result())

# 我们还可以不用shutdown(),用下面这种方式
# while 1:
#     for n,ti in enumerate(t_lst):
#         print('>>>>', ti.result(),n)
#     time.sleep(2) #每个两秒去去一次结果,哪个有结果了,就可以取出哪一个,想表达的意思就是说不用等到所有的结果都出来再去取,可以轮询着去取结果,因为你的任务需要执行的时间很长,那么你需要等很久才能拿到结果,通过这样的方式可以将快速出来的结果先拿出来。如果有的结果对象里面还没有执行结果,那么你什么也取不到,这一点要注意,不是空的,是什么也取不到,那怎么判断我已经取出了哪一个的结果,可以通过枚举enumerate来搞,记录你是哪一个位置的结果对象的结果已经被取过了,取过的就不再取了

#结果分析: 打印的结果是没有顺序的,因为到了func函数中的sleep的时候线程会切换,谁先打印就没准儿了,但是最后的我们通过结果对象取结果的时候拿到的是有序的,因为我们主线程进行for循环的时候,我们是按顺序将结果对象添加到列表中的。
# 37220打印的: 0
# 32292打印的: 4
# 33444打印的: 1
# 30068打印的: 2
# 29884打印的: 3
# 主线程
# >>>> 0
# >>>> 1
# >>>> 4
# >>>> 9
# >>>> 16

ThreadPoolExecutor的简单使用

ProcessPoolExcutor use:

只需要将这一行代码改为下面这一行就可以了,其他的代码都不用变
tpool = ThreadPoolExecutor(max_workers=5) #默认一般起线程的数据不超过CPU个数*5
# tpool = ProcessPoolExecutor(max_workers=5)

你就会发现为什么将线程池和进程池都放到这一个模块里面了,用法一样
对,就是鸭子类型.

Use the map:

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
import threading
import os,time,random
def task(n):
    print('%s is runing' %threading.get_ident())
    time.sleep(random.randint(1,3))
    return n**2

if __name__ == '__main__':

    executor=ThreadPoolExecutor(max_workers=3)

    # for i in range(11):
    #     future=executor.submit(task,i)

    s = executor.map(task,range(1,5)) #map取代了for+submit
    print([i for i in s])

map的使用

Simply use the callback function:

import time
import os
import threading
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

def func(n):
    time.sleep(2)
    return n*n

def call_back(m):
    print('结果为:%s'%(m.result()))

tpool = ThreadPoolExecutor(max_workers=5)
t_lst = []
for i in range(5):
    t = tpool.submit(func,i).add_done_callback(call_back)

回调函数简单应用

Simple exercises callback function:

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
from multiprocessing import Pool
import requests
import json
import os

def get_page(url):
    print('<进程%s> get %s' %(os.getpid(),url))
    respone=requests.get(url)
    if respone.status_code == 200:
        return {'url':url,'text':respone.text}

def parse_page(res):
    res=res.result()
    print('<进程%s> parse %s' %(os.getpid(),res['url']))
    parse_res='url:<%s> size:[%s]\n' %(res['url'],len(res['text']))
    with open('db.txt','a') as f:
        f.write(parse_res)


if __name__ == '__main__':
    urls=[
        'https://www.baidu.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://help.github.com/',
        'http://www.sina.com.cn/'
    ]

    # p=Pool(3)
    # for url in urls:
    #     p.apply_async(get_page,args=(url,),callback=pasrse_page)
    # p.close()
    # p.join()

    p=ProcessPoolExecutor(3)
    for url in urls:
        p.submit(get_page,url).add_done_callback(parse_page) #parse_page拿到的是一个future对象obj,需要用obj.result()拿到结果

回调函数的应用,需要你自己去练习的

Thread pool: a container, the container locked you turn the number of threads, such as 4, processing four concurrent tasks for the first time can only be sure of, as long as the task is completed, the thread will soon take over a task.

Time for space.

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import os
import time
import random

# print(os.cpu_count())   # 查看电脑的CPU数

def task(n):
    print(f"{os.getpid()} 处理任务!")
    time.sleep(random.randint(1,3))


if __name__ == '__main__':
    # # 开启进程池 (并行(并发+并行))
    # p = ProcessPoolExecutor()   # 默认不写,进程池里面的进程数与CPU个数相等. # 8
    # for i in range(20):
    #     p.submit(task,i)



    # 开启多线程(并发)
    # t = ThreadPoolExecutor()    # 默认不写,就是开启CPU个数*5个线程数    # 8*5=40
    t = ThreadPoolExecutor(100)
    for i in range(20):
        t.submit(task,i)

Because life, a server can not simultaneously receiving too many requests (such as a million), all restrictions on the use of thread pool, the following is a quick example:

server side:

from threading import Thread
from concurrent.futures import ThreadPoolExecutor
import os
import socket
import time
import random
def communicate(conn, addr):
    while 1:
        try:
            from_client_data = conn.recv(1024).decode("utf-8")
            if from_client_data.lower() == "q":
                print(f"客户端{addr}正常退出!")
                break
            print(f"来自客户端{addr}的消息:{from_client_data}")
            msg = input(">>>").encode("utf-8")
            conn.send(msg)
        except Exception:
            print(f"客户端{addr}非正常中断")
            break
    conn.close()

def accept_():
    sk = socket.socket()
    sk.bind(("127.0.0.1", 8080))
    sk.listen(5)
    t = ThreadPoolExecutor(3)
    while 1:
        conn, addr = sk.accept()
        # t = Thread(target=communicate,args=(conn,addr))
        t.submit(communicate, conn, addr)
    sk.close()

if __name__ == '__main__':
    accept_()

client side:

import socket

sk = socket.socket()
sk.connect(("127.0.0.1",8080))

while 1:
    try:
        msg = input(">>>")
        if msg == "":
            print("输入不能为空,请重新输入!")
            continue
        sk.send(msg.encode("utf-8"))
        if msg.lower() == "q":
            break
        from_server_data = sk.recv(1024).decode('UTF-8')
        print(from_server_data)
    except Exception:
        break
sk.close()

Guess you like

Origin www.cnblogs.com/zhangchaoyin/p/11415404.html