Table of contents
Multitasking
Simulate multitasking in the program
understanding of multitasking
Threads for multitasking
View the number of threads
Verify the execution and creation of sub-threads
Inherit the Thread class to create threads
Multi-thread shared global variables (inter-thread communication)
multithreading arguments -args
Shared global variable resource competition
mutex
deadlock
avoid deadlock
Queue thread
_______________________________________________
Multitasking
class sing():
for i in range(5):
print("我爱python")
print("我爱python---------运行结束")
def dence():
for i in range(5):
print("Pyhon很好玩")
print("python很好玩---------运行结束")
if __name__ == '__main__':
sing()
dence()
It can be seen that when our code runs from top to bottom, one after another, but have we ever thought about whether there is no delay in the running of the code? If it is delayed, can it be run downwards, and the delayed The code is pushed back, of course there is, that is multi-threading, let me introduce it below
Understanding multitasking:
Some cuties should have heard of concurrency and parallelism. These two words are heard more in threads. Cuties know that the code runs on the cpu. The cpu is divided into single-core and multi-core. Let’s draw a picture below ,
When we want to run all these processes, it is not possible in a single-core CPU, but if the QQ stays in the CPU for a short time, it will go in and run immediately after WeChat, stay for a while, go out, and then the next one will come in, and then If you go out, you can mistakenly think that these processes are running at the same time, but in fact they do not exist. This is concurrency, concurrency: the fake multi-tasking cpu is smaller than the currently executing task
It can be seen that multi-core cpu is achievable, which is parallelism, parallelism: the real multi-tasking cpu is larger than the currently executing task
Threads for multitasking
Here is our code demonstration:
from threading import Thread
def sing():
#子线程
for i in range(5):
print("我爱python")
print("我爱python---------运行结束")
def dence():
#子线程
for i in range(5):
print("Pyhon很好玩")
print("python很好玩---------运行结束")
"""这是一个主线程"""
if __name__ == '__main__':
for i in range(2):
# 创建线程对象(这还不算是创建线程完成)
sin=Thread(target=sing)
# 启动(这里才算完成线程创建)
sin.start()
result:
It’s okay if you don’t understand, cuties, then I’ll explain them one by one.
First of all, we need to download the module threading, which was introduced earlier, so I won’t introduce too much here
import threading
Create a thread object (the thread has not been fully created) because
Verify the execution and creation of sub-threads:
from threading import Thread
import time
def sing():
#子线程
for i in range(5):
print("我爱python")
print("我爱python---------运行结束")
time.sleep(10)
def dence():
#子线程
for i in range(5):
print("Pyhon很好玩")
print("python很好玩---------运行结束")
"""这是一个主线程"""
if __name__ == '__main__':
a=time.time()
for i in range(2):
# 创建线程对象(这还不算是创建线程完成)
sin=Thread(target=sing)
# 启动(这里才算完成线程创建)
sin.start()
b=time.time()
print(b-a)
result:
It can be seen that the output time is the running time of the main program, let's look at the running time of the sub-thread
from threading import Thread
import threading
import time
def sing():
#子线程
for i in range(100):
print("我爱python")
print("我爱python---------运行结束")
time.sleep(2)
def dence():
#子线程
for i in range(100):
print("Pyhon很好玩")
print("python很好玩---------运行结束")
"""这是一个主线程"""
if __name__ == '__main__':
a=time.time()
lis=[]
for i in range(1):
# 创建线程对象(这还不算是创建线程完成)
t=threading.Thread(target=sing)
# 启动(这里才算完成线程创建)
t.start()
lis.append(t)
for i in lis:
i.join()
b = time.time()
print(b-a)
print("主程序运行到结尾")
result:
join() is to wait for the child thread to finish running before the main program starts running
Let me crawl the webpage together (normal version and multi-threaded version) to see the running time:
import requests
import threading
from lxml import etree
import time
def prase_url(url,header):
response=requests.get(url, headers=header)
return response
def parse_data(html):
e_html=etree.HTML(html)
new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
# print("".join(new_html).strip())
h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
print(h1)
return h1,"".join(new_html).strip()
def save_data(data):
with open("./小说/{}.txt".format(data[0]),"w",encoding="utf-8")as f:
f.write(data[1])
def main(urls):
"""主要的业务逻辑"""
# url
for url in urls:
header={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
}
# 发送请求获取响应
response=prase_url(url,header)
html=response.text
# print(html)
# 数据的提取
data=parse_data(html)
# 保存
save_data(data)
if __name__ == '__main__':
a=time.time()
lis=[]
urls=[]
for i in range(56, 93):
url = "http://www.quannovel.com/read/620/2467{}.html".format(i)
urls.append(url)
# for i in range(2):
# t1=threading.Thread(target=main,args=(urls,))
# t1.start()
# lis.append(t1)
# for t in lis:
# t.join()
# 单线程
main(urls)
b=time.time()
print(b-a)
print("主线程运行结束,等待子线程运行结束")
The first multithread:
import requests
import threading
from lxml import etree
import time
def prase_url(url,header):
response=requests.get(url, headers=header)
return response
def parse_data(html):
e_html=etree.HTML(html)
new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
# print("".join(new_html).strip())
h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
print(h1)
return h1,"".join(new_html).strip()
def save_data(data):
with open("./小说/{}.txt".format(data[0]),"w",encoding="utf-8")as f:
f.write(data[1])
def main(urls):
"""主要的业务逻辑"""
# url
for url in urls:
header={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
}
# 发送请求获取响应
response=prase_url(url,header)
html=response.text
# print(html)
# 数据的提取
data=parse_data(html)
# 保存
save_data(data)
if __name__ == '__main__':
a=time.time()
lis=[]
urls=[]
for i in range(56, 93):
url = "http://www.quannovel.com/read/620/2467{}.html".format(i)
urls.append(url)
for i in range(2):
t1=threading.Thread(target=main,args=(urls,))
t1.start()
lis.append(t1)
for t in lis:
t.join()
# 单线程
# main(urls)
b=time.time()
print(b-a)
print("主线程运行结束,等待子线程运行结束")
import requests
import threading
from lxml import etree
import time
def prase_url(url,header):
response=requests.get(url, headers=header)
return response
def parse_data(html):
e_html=etree.HTML(html)
new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
# print("".join(new_html).strip())
h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
print(h1)
return h1,"".join(new_html).strip()
def save_data(data):
with open("./小说/{}.txt".format(data[0]),"w",encoding="utf-8")as f:
f.write(data[1])
def main(i):
"""主要的业务逻辑"""
# url
url="http://www.quannovel.com/read/620/2467{}.html".format(i)
header={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
}
# 发送请求获取响应
response=prase_url(url,header)
html=response.text
# print(html)
# 数据的提取
data=parse_data(html)
# 保存
save_data(data)
if __name__ == '__main__':
a=time.time()
lis=[]
for i in range(56,93):
t1=threading.Thread(target=main,args=(i,))
t1.start()
lis.append(t1)
for t in lis:
t.join()
# 单线程
# for i in range(56,93):
# main(i)
b=time.time()
print(b-a)
print("主线程运行结束,等待子线程运行结束")
result
The cuties will find out what's going on when they see this, the thread takes too much time, because there is a problem with the thread, let's analyze it one by one,
The first multithreading : resource competition occurs, because multiple threads write together, because the running time of our threads is determined by the cpu,
When the cuties are running, they will find that the running results of the five threads are not correct. What is the reason? That is, each of our thread runs from the beginning to the end. Every time a thread is created, the parameters are passed again, and each thread is different from each other. Intervention, it seems that every time we buy a product, once we sell it out, the merchant will make up the product, t1=threading.Thread(target=main, args=(urls,)) is such a principle, or we design a Remove one, or we run it all at once,
If you are careful, you will find that the storage of data is disordered, mainly because of resource competition. Let me explain it below:
As shown in the figure above, the first multi-thread is such a situation. If you want to prevent this, you need to add a lock to make each thread run here, so that the next thread cannot write. We will talk about the use of locks later.
The second multithreading: It can be seen that we have created many threads, but the result is that each thread we created only crawls one website, which invisibly adds a burden to the running of the code. Although it can crawl, it consumes a lot Big, all the threads we create should be suitable,
Inherit the Thread class to create threads
import requests
import threading
from lxml import etree
import time
import queue
# 这个类用于爬取数据
class My_Thread(threading.Thread):
def __init__(self,urls,header,datas):
super().__init__()
self.urls=urls
self.header=header
self.datas=datas
# print(self.urls.qsize())
def prase_url(self,url):
response=requests.get(url, headers=self.header)
return response
def parse_data(self,html):
e_html=etree.HTML(html)
new_html=e_html.xpath('//div[@id="htmlContent"]//text()')
# print("".join(new_html).strip())
h1=e_html.xpath('//div[@class="chapter-detail"]/h1/text()')[0]
# print(h1)
print("获取中")
return (h1,"".join(new_html).strip())
def run(self):
"""主要的业务逻辑"""
while not self.urls.empty():
# url
a=self.urls.get()
# 发送请求获取响应
response=self.prase_url(a)
html=response.text
# 数据的提取
data=self.parse_data(html)
self.datas.put(data)
# print(self.datas.qsize())
# 这个类用于保存文件
class Save_data(threading.Thread):
def __init__(self,datas):
super().__init__()
self.datas=datas
print(1)
def run(self):
while not self.datas.empty():
a=self.datas.get()
print("保存中")
with open("./小说/{}.txt".format(a[0]), "w", encoding="utf-8")as f:
f.write(a[1])
def main():
# url
urls = queue.Queue()
datas=queue.Queue()
for i in range(56, 93):
url = "http://www.quannovel.com/read/620/2467{}.html".format(i)
urls.put(url)
# print(urls.qsize())
header={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
}
# 创建多线程
lis=[]
for i in range(5):
my_thead=My_Thread(urls,header,datas)
my_thead.start()
lis.append(my_thead)
for i in lis:
i.join()
# print(datas.get())
for i in range(5):
sa_da=Save_data(datas)
sa_da.start()
if __name__ == '__main__':
a=time.time()
main()
b=time.time()
print(b-a)
print("主线程运行结束,等待子线程运行结束")
result:
Looking at the multi-threaded operation, it can be said that it can be crawled at once. Now let me analyze the code:
1. When we create the Threa class, we need to inherit the parent class, which is threading.Thread
2. There is a method in it is run(self),!!! This can be rewritten, and the name cannot be changed. Writing the code you need to run is equivalent to main() when we didn’t learn multi-threading
3. Now the mian() function inside is mainly used to create threads
4.
for i in lis: i.join()
To write this, one is to wait for all the data crawled to crawl down, so that there will be no data saved when multi-threading is saved to the file later.
5.queue.Queue(): Create a queue
Multi-thread shared global variables (inter-thread communication)
Simply put, it uses global variables as a medium to complete data transmission
Then we will encounter a problem: Do we need to add global to modify global variables?
import threading
# 修改全局变量是否要加global(根据修改值是否发生地址的改变,地址改变就要加global)
num=0
#写
def task1(nu,n):
global num
num+=nu
print("task1=",num)
print("n=%d"%n)
#读
def task2():
print("task2=",num)
def main():
# 创建子线程
t1=threading.Thread(target=task1,args=(3,4))
t2=threading.Thread(target=task2)
# 启动子线程(这里才是是完全创建子线程)
t1.start()
t2.start()
print("main.....",num)
if __name__ == '__main__':
main()
result:
As for whether to add global to the function, it depends on whether changing the value causes the address to change.
mutex
It is used to solve resource competition. The so-called resource competition means that multiple threads operate in the same direction. I have already mentioned it before.
Here's another simple code:
Without locking:
import threading
import time
"""两个同时写入,不加锁"""
num=0
def task1():
global num
for i in range(100000000):
num+=1
print("task1.......%d"%num)
def task2():
global num
for i in range(100000000):
num+=1
print("task2.......%d"%num)
def main():
# 创建子线程
t1=threading.Thread(target=task1)
t2=threading.Thread(target=task2)
# 启动
t1.start()
t2.start()
print("main....%d"%num)
if __name__ == '__main__':
main()
result:
code:
Use RLock() to create multiple locks
import threading
import time
"""加锁"""
num=0
# 创建一个锁
# mutex=threading.Lock()
mutex=threading.RLock()
def task1():
global num
# 锁定(保证数据能正常存储)
mutex.acquire()
mutex.acquire()
for i in range(100000000):
num+=1
mutex.release()
mutex.release()
# 解锁(使下一个线程能使用)
print("task1.......%d"%num)
def task2():
global num
# 锁定(保证数据能正常存储)
mutex.acquire()
mutex.acquire()
for i in range(100000000):
num+=1
mutex.release()
mutex.release()
# 解锁(使下一个线程能使用)
print("task2.......%d"%num)
def main():
# 创建子线程
t1=threading.Thread(target=task1)
t2=threading.Thread(target=task2)
# 启动
t1.start()
t2.start()
print("main....%d"%num)
if __name__ == '__main__':
main()
result:
Create a lock using Lock
import threading
import time
""加锁"""
num=0
# 创建一个锁
mutex=threading.Lock()
# mutex=threading.RLock()
def task1():
global num
# 锁定(保证数据能正常存储)
# mutex.acquire()
mutex.acquire()
for i in range(100000000):
num+=1
mutex.release()
# mutex.release()
# 解锁(使下一个线程能使用)
print("task1.......%d"%num)
def task2():
global num
# 锁定(保证数据能正常存储)
mutex.acquire()
# mutex.acquire()
for i in range(100000000):
num+=1
mutex.release()
# mutex.release()
# 解锁(使下一个线程能使用)
print("task2.......%d"%num)
def main():
# 创建子线程
t1=threading.Thread(target=task1)
t2=threading.Thread(target=task2)
# 启动
t1.start()
t2.start()
print("main....%d"%num)
if __name__ == '__main__':
main()
result:
code:
import threading
import time
mutex=threading.Lock()
def task1():
global num
with mutex:
for i in range(100000000):
num+=1
print("task1.......%d"%num)
def task2():
global num
with mutex:
for i in range(100000000):
num+=1
print("task2.......%d"%num)
def main():
# 创建子线程
t1=threading.Thread(target=task1)
t2=threading.Thread(target=task2)
# 启动
t1.start()
t2.start()
print("main....%d"%num)
if __name__ == '__main__':
main()
result;
threading.Lock(), only one lock can be created and one lock can be unlocked
threading.RLock(), can only create and unlock multiple locks
acquire() lock release() unlock
with mutex: automatically lock and unlock, the same effect as with open
Queue thread
import queue
import threading
# 创建队列
# a=queue.Queue(5)
# for i in range(5):
# a.put(i) #存入元素
# print(a.full())
# print(a)
# for i in range(5):
# # print(a.get())
# print(a.get_nowait())
# print(a.empty())
# 创建队列
q=queue.Queue()
num = 0
q.put(num)#把num的值存入
def task1():
for i in range(10000000):
num=q.get() # 创建一个名为num的局部变量
num+=1
q.put(num)
# return q # 反不返回没事
def task2():
for i in range(1000000):
num = q.get() # 创建一个名为num的局部变量
num += 1
q.put(num)
# return q
def main():
# 创建子线程
t1=threading.Thread(target=task1)
t2=threading.Thread(target=task2)
# 启动
t1.start()
t2.start()
t1.join()
t2.join()
print("main....%d"%num)
if __name__ == '__main__':
main()
Summarize
In general, the purpose of threads is to greatly improve the utilization of time, improve the efficiency of computer operation, crawl too many things, and run very slowly without threads.