Python multi-threading ------> This stuff is amazing, why don't you take a look

Table of contents

Multitasking

Simulate multitasking in the program

understanding of multitasking

Threads for multitasking

View the number of threads

Verify the execution and creation of sub-threads

Inherit the Thread class to create threads

Multi-thread shared global variables (inter-thread communication)

multithreading arguments -args

Shared global variable resource competition

mutex

deadlock

avoid deadlock

Queue thread

_______________________________________________

Multitasking

There are many scenes where things are performed simultaneously, such as driving a car with both hands and feet while driving.
Driving, for example, singing and dancing are also carried out at the same time, all of which are driven on the road by completing the last thing,
As the name implies, many things can be done at the same time, and these things are tasks
Let me demonstrate our usual code:
class sing():
    for i in range(5):
        print("我爱python")
    print("我爱python---------运行结束")


def dence():
    for i in range(5):
        print("Pyhon很好玩")
    print("python很好玩---------运行结束")



if __name__ == '__main__':
    sing()
    dence()

result:

 It can be seen that when our code runs from top to bottom, one after another, but have we ever thought about whether there is no delay in the running of the code? If it is delayed, can it be run downwards, and the delayed The code is pushed back, of course there is, that is multi-threading, let me introduce it below

Understanding multitasking:

Some cuties should have heard of concurrency and parallelism. These two words are heard more in threads. Cuties know that the code runs on the cpu. The cpu is divided into single-core and multi-core. Let’s draw a picture below ,

 When we want to run all these processes, it is not possible in a single-core CPU, but if the QQ stays in the CPU for a short time, it will go in and run immediately after WeChat, stay for a while, go out, and then the next one will come in, and then If you go out, you can mistakenly think that these processes are running at the same time, but in fact they do not exist. This is concurrency, concurrency: the fake multi-tasking cpu is smaller than the currently executing task

It can be seen that multi-core cpu is achievable, which is parallelism, parallelism: the real multi-tasking cpu is larger than the currently executing task

Threads for multitasking

Here is our code demonstration:

from threading import Thread
def sing():
    #子线程
    for i in range(5):
        print("我爱python")
    print("我爱python---------运行结束")


def dence():
    #子线程
    for i in range(5):
        print("Pyhon很好玩")
    print("python很好玩---------运行结束")


"""这是一个主线程"""
if __name__ == '__main__':
    for i in range(2):
        # 创建线程对象(这还不算是创建线程完成)
        sin=Thread(target=sing)
        # 启动(这里才算完成线程创建)
        sin.start()

result:

It’s okay if you don’t understand, cuties, then I’ll explain them one by one.

First of all, we need to download the module threading, which was introduced earlier, so I won’t introduce too much here

import threading

Create a thread object (the thread has not been fully created) because

Verify the execution and creation of sub-threads:

When calling Thread, no thread is created.
When the start method of the instance object created by Thread is called, the thread will be created and started.
Start running this thread.
Parameter target: incoming function name
Parameter artgs: Pass in a tuple, such as artgs=(3,5, the simple point is to pass parameters into the function, the function here is the sub-thread, I think the function (sub-thread) is the task,
start(): start the thread
join(): The main program can only run after the child thread runs
In the figure, I just created a single thread, just like we usually write code, let's take a look at multithreading:
from threading import Thread
import time
def sing():
    #子线程
    for i in range(5):
        print("我爱python")
    print("我爱python---------运行结束")
    time.sleep(10)


def dence():
    #子线程
    for i in range(5):
        print("Pyhon很好玩")
    print("python很好玩---------运行结束")


"""这是一个主线程"""
if __name__ == '__main__':
    a=time.time()
    for i in range(2):
        # 创建线程对象(这还不算是创建线程完成)
        sin=Thread(target=sing)
        # 启动(这里才算完成线程创建)
        sin.start()
    b=time.time()
    print(b-a)

result:

 It can be seen that the output time is the running time of the main program, let's look at the running time of the sub-thread

from threading import Thread
import threading
import time
def sing():
    #子线程
    for i in range(100):
        print("我爱python")
    print("我爱python---------运行结束")
    time.sleep(2)



def dence():
    #子线程
    for i in range(100):
        print("Pyhon很好玩")
    print("python很好玩---------运行结束")


"""这是一个主线程"""
if __name__ == '__main__':
    a=time.time()
    lis=[]
    for i in range(1):
        # 创建线程对象(这还不算是创建线程完成)
        t=threading.Thread(target=sing)
        # 启动(这里才算完成线程创建)
        t.start()
        lis.append(t)
    for i in lis:
        i.join()
    b = time.time()
    print(b-a)
    print("主程序运行到结尾")

result:

 join() is to wait for the child thread to finish running before the main program starts running

Let me crawl the webpage together (normal version and multi-threaded version) to see the running time:

import requests
import threading
from lxml import etree
import time


def prase_url(url,header):
    response=requests.get(url, headers=header)
    return response


def parse_data(html):
    e_html=etree.HTML(html)
    new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
    # print("".join(new_html).strip())
    h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
    print(h1)
    return h1,"".join(new_html).strip()


def save_data(data):
    with open("./小说/{}.txt".format(data[0]),"w",encoding="utf-8")as f:
        f.write(data[1])

def main(urls):
    """主要的业务逻辑"""
    # url
    for url in urls:
        header={
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }
        # 发送请求获取响应
        response=prase_url(url,header)
        html=response.text
        # print(html)

        # 数据的提取
        data=parse_data(html)
        # 保存
        save_data(data)


if __name__ == '__main__':
    a=time.time()
    lis=[]
    urls=[]
    for i in range(56, 93):
        url = "http://www.quannovel.com/read/620/2467{}.html".format(i)
        urls.append(url)
    # for i in range(2):
    #     t1=threading.Thread(target=main,args=(urls,))
    #     t1.start()
    #     lis.append(t1)
    # for t in lis:
    #     t.join()

    # 单线程

    main(urls)
    b=time.time()
    print(b-a)
    print("主线程运行结束,等待子线程运行结束")

 The first multithread:

import requests
import threading
from lxml import etree
import time


def prase_url(url,header):
    response=requests.get(url, headers=header)
    return response


def parse_data(html):
    e_html=etree.HTML(html)
    new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
    # print("".join(new_html).strip())
    h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
    print(h1)
    return h1,"".join(new_html).strip()


def save_data(data):
    with open("./小说/{}.txt".format(data[0]),"w",encoding="utf-8")as f:
        f.write(data[1])

def main(urls):
    """主要的业务逻辑"""
    # url
    for url in urls:
        header={
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }
        # 发送请求获取响应
        response=prase_url(url,header)
        html=response.text
        # print(html)

        # 数据的提取
        data=parse_data(html)
        # 保存
        save_data(data)


if __name__ == '__main__':
    a=time.time()
    lis=[]
    urls=[]
    for i in range(56, 93):
        url = "http://www.quannovel.com/read/620/2467{}.html".format(i)
        urls.append(url)
    for i in range(2):
        t1=threading.Thread(target=main,args=(urls,))
        t1.start()
        lis.append(t1)
    for t in lis:
        t.join()

    # 单线程

    # main(urls)
    b=time.time()
    print(b-a)
    print("主线程运行结束,等待子线程运行结束")
import requests
import threading
from lxml import etree
import time


def prase_url(url,header):
    response=requests.get(url, headers=header)
    return response


def parse_data(html):
    e_html=etree.HTML(html)
    new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
    # print("".join(new_html).strip())
    h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
    print(h1)
    return h1,"".join(new_html).strip()


def save_data(data):
    with open("./小说/{}.txt".format(data[0]),"w",encoding="utf-8")as f:
        f.write(data[1])

def main(i):
    """主要的业务逻辑"""
    # url
    url="http://www.quannovel.com/read/620/2467{}.html".format(i)
    header={
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
    }
    # 发送请求获取响应
    response=prase_url(url,header)
    html=response.text
    # print(html)

    # 数据的提取
    data=parse_data(html)
    # 保存
    save_data(data)


if __name__ == '__main__':
    a=time.time()
    lis=[]
    for i in range(56,93):
        t1=threading.Thread(target=main,args=(i,))
        t1.start()
        lis.append(t1)
    for t in lis:
        t.join()

    # 单线程
    # for i in range(56,93):
    #     main(i)
    b=time.time()
    print(b-a)
    print("主线程运行结束,等待子线程运行结束")

result

The cuties will find out what's going on when they see this, the thread takes too much time, because there is a problem with the thread, let's analyze it one by one,

The first multithreading : resource competition occurs, because multiple threads write together, because the running time of our threads is determined by the cpu,

When the cuties are running, they will find that the running results of the five threads are not correct. What is the reason? That is, each of our thread runs from the beginning to the end. Every time a thread is created, the parameters are passed again, and each thread is different from each other. Intervention, it seems that every time we buy a product, once we sell it out, the merchant will make up the product, t1=threading.Thread(target=main, args=(urls,)) is such a principle, or we design a Remove one, or we run it all at once,

If you are careful, you will find that the storage of data is disordered, mainly because of resource competition. Let me explain it below:

 As shown in the figure above, the first multi-thread is such a situation. If you want to prevent this, you need to add a lock to make each thread run here, so that the next thread cannot write. We will talk about the use of locks later.

The second multithreading: It can be seen that we have created many threads, but the result is that each thread we created only crawls one website, which invisibly adds a burden to the running of the code. Although it can crawl, it consumes a lot Big, all the threads we create should be suitable,

Inherit the Thread class to create threads

import requests
import threading
from lxml import etree
import time
import queue


# 这个类用于爬取数据
class My_Thread(threading.Thread):
    def __init__(self,urls,header,datas):
        super().__init__()
        self.urls=urls
        self.header=header
        self.datas=datas
        # print(self.urls.qsize())

    def prase_url(self,url):

        response=requests.get(url, headers=self.header)
        return response

    def parse_data(self,html):
        e_html=etree.HTML(html)
        new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
        # print("".join(new_html).strip())
        h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
        # print(h1)
        print("获取中")
        return (h1,"".join(new_html).strip())

    def run(self):
        """主要的业务逻辑"""
        while not self.urls.empty():
            # url
            a=self.urls.get()
            # 发送请求获取响应
            response=self.prase_url(a)
            html=response.text
            # 数据的提取
            data=self.parse_data(html)
            self.datas.put(data)
            # print(self.datas.qsize())


# 这个类用于保存文件
class Save_data(threading.Thread):
    def __init__(self,datas):
        super().__init__()
        self.datas=datas
        print(1)


    def run(self):
        while not self.datas.empty():
            a=self.datas.get()
            print("保存中")
            with open("./小说/{}.txt".format(a[0]), "w", encoding="utf-8")as f:
                f.write(a[1])


def main():

    # url
    urls = queue.Queue()
    datas=queue.Queue()
    for i in range(56, 93):
        url = "http://www.quannovel.com/read/620/2467{}.html".format(i)
        urls.put(url)
    # print(urls.qsize())
    header={
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
    }
   # 创建多线程
    lis=[]
    for i in range(5):
        my_thead=My_Thread(urls,header,datas)
        my_thead.start()
        lis.append(my_thead)
    for i in lis:
        i.join()
    # print(datas.get())
    for i in range(5):
        sa_da=Save_data(datas)
        sa_da.start()






if __name__ == '__main__':
    a=time.time()
    main()
    b=time.time()
    print(b-a)
    print("主线程运行结束,等待子线程运行结束")

result:

 Looking at the multi-threaded operation, it can be said that it can be crawled at once. Now let me analyze the code:

1. When we create the Threa class, we need to inherit the parent class, which is threading.Thread

2. There is a method in it is run(self),!!! This can be rewritten, and the name cannot be changed. Writing the code you need to run is equivalent to main() when we didn’t learn multi-threading

3. Now the mian() function inside is mainly used to create threads

4.

for i in lis:
    i.join()

To write this, one is to wait for all the data crawled to crawl down, so that there will be no data saved when multi-threading is saved to the file later.

5.queue.Queue(): Create a queue

Multi-thread shared global variables (inter-thread communication)

Simply put, it uses global variables as a medium to complete data transmission

Then we will encounter a problem: Do we need to add global to modify global variables?

actually:
In a function, when modifying a global variable, whether to add global depends on whether the global variable
The pointer of the quantity has been modified. If the pointer is modified, then global must be used, only the pointer is modified.
The data in the oriented space, it is not necessary to use global at this time,
Let me write a simple code to show the little cuties:
import threading
# 修改全局变量是否要加global(根据修改值是否发生地址的改变,地址改变就要加global)
num=0
#写
def task1(nu,n):
    global num
    num+=nu
    print("task1=",num)
    print("n=%d"%n)

#读
def task2():
    print("task2=",num)


def main():

    # 创建子线程
    t1=threading.Thread(target=task1,args=(3,4))
    t2=threading.Thread(target=task2)
    # 启动子线程(这里才是是完全创建子线程)
    t1.start()
    t2.start()
    print("main.....",num)


if __name__ == '__main__':
    main()

result:

 As for whether to add global to the function, it depends on whether changing the value causes the address to change.

mutex

It is used to solve resource competition. The so-called resource competition means that multiple threads operate in the same direction. I have already mentioned it before.

Here's another simple code:

Without locking:

import threading
import time

"""两个同时写入,不加锁"""
num=0

def task1():
    global num
    for i in range(100000000):
        num+=1
    print("task1.......%d"%num)


def task2():
    global num
    for i in range(100000000):
        num+=1
    print("task2.......%d"%num)


def main():
    # 创建子线程
    t1=threading.Thread(target=task1)
    t2=threading.Thread(target=task2)
    # 启动
    t1.start()
    t2.start()
    print("main....%d"%num)


if __name__ == '__main__':
    main()

result:

 

code:

Use RLock() to create multiple locks

import threading
import time
"""加锁"""

num=0
# 创建一个锁
# mutex=threading.Lock()
mutex=threading.RLock()

def task1():
    global num
    # 锁定(保证数据能正常存储)
    mutex.acquire()
    mutex.acquire()
    for i in range(100000000):
        num+=1
    mutex.release()
    mutex.release()
    # 解锁(使下一个线程能使用)
    print("task1.......%d"%num)


def task2():
    global num
    # 锁定(保证数据能正常存储)
    mutex.acquire()
    mutex.acquire()
    for i in range(100000000):
        num+=1
    mutex.release()
    mutex.release()
    # 解锁(使下一个线程能使用)
    print("task2.......%d"%num)

def main():
    # 创建子线程
    t1=threading.Thread(target=task1)
    t2=threading.Thread(target=task2)
    # 启动
    t1.start()
    t2.start()
    print("main....%d"%num)


if __name__ == '__main__':
    main()

result:

 Create a lock using Lock

import threading
import time
""加锁"""

num=0
# 创建一个锁
mutex=threading.Lock()
# mutex=threading.RLock()

def task1():
    global num
    # 锁定(保证数据能正常存储)
    # mutex.acquire()
    mutex.acquire()
    for i in range(100000000):
        num+=1
    mutex.release()
    # mutex.release()
    # 解锁(使下一个线程能使用)
    print("task1.......%d"%num)


def task2():
    global num
    # 锁定(保证数据能正常存储)
    mutex.acquire()
    # mutex.acquire()
    for i in range(100000000):
        num+=1
    mutex.release()
    # mutex.release()
    # 解锁(使下一个线程能使用)
    print("task2.......%d"%num)

def main():
    # 创建子线程
    t1=threading.Thread(target=task1)
    t2=threading.Thread(target=task2)
    # 启动
    t1.start()
    t2.start()
    print("main....%d"%num)


if __name__ == '__main__':
    main()

result:

 code:

import threading
import time

mutex=threading.Lock()
def task1():
    global num
    with mutex:
        for i in range(100000000):
            num+=1
    print("task1.......%d"%num)


def task2():
    global num
    with mutex:
        for i in range(100000000):
            num+=1
    print("task2.......%d"%num)





def main():
    # 创建子线程
    t1=threading.Thread(target=task1)
    t2=threading.Thread(target=task2)
    # 启动
    t1.start()
    t2.start()
    print("main....%d"%num)


if __name__ == '__main__':
    main()

result;

 threading.Lock(), only one lock can be created and one lock can be unlocked

 threading.RLock(), can only create and unlock multiple locks

acquire() lock 
release() unlock
with mutex: automatically lock and unlock, the same effect as with open

Queue thread

In a thread, accessing some global variables and locking is a regular process. If you want to put some number
If the data is stored in a queue, Python has a built-in thread-safe module called the queue module.
piece. The queue module in Python provides synchronous, thread-safe queue classes, including FIFO
(First in first out) queue Queue, LIFO (last in first out) queue LifoQueue. These queues are
The lock primitive (which can be understood as an atomic operation, that is, either do not do it, or do it all), can be used in multi-line
directly used in the process. Queues can be used to achieve synchronization between threads.
Simply put, the person who created this queue thinks the lock is too troublesome, so he created this to simplify
code show as below:
import queue
import threading

# 创建队列
# a=queue.Queue(5)
# for i in range(5):
#     a.put(i) #存入元素
#     print(a.full())
# print(a)
# for i in range(5):
#     # print(a.get())
#     print(a.get_nowait())
#     print(a.empty())




# 创建队列
q=queue.Queue()

num = 0
q.put(num)#把num的值存入


def task1():

    for i in range(10000000):
        num=q.get() # 创建一个名为num的局部变量
        num+=1
        q.put(num)
    # return q # 反不返回没事



def task2():

    for i in range(1000000):
        num = q.get()  # 创建一个名为num的局部变量
        num += 1
        q.put(num)
    # return q


def main():
    # 创建子线程
    t1=threading.Thread(target=task1)
    t2=threading.Thread(target=task2)
    # 启动
    t1.start()
    t2.start()
    t1.join()
    t2.join()

    print("main....%d"%num)


if __name__ == '__main__':
    main()
Initialize Queue ( maxsize ) : Create a first-in, first-out queue.
qsize () : Returns the size of the queue.
empty () : Determine whether the queue is empty. Empty returns True
full () : Determine whether the queue is full. Full returns True
get () : Get the last data from the queue.
put () : Put a piece of data into the queue.

Summarize

In general, the purpose of threads is to greatly improve the utilization of time, improve the efficiency of computer operation, crawl too many things, and run very slowly without threads.

Guess you like

Origin blog.csdn.net/m0_69984273/article/details/130976961