python多线程结合Queue使用

大家好，我是W

前言：相信大家在做多线程的时候肯定都会想到结合Queue来用，那么到底怎么用呢？这篇文章来探索一下。

学习Queue

引用库
```
 from queue import Queue
```

声明队列

 q = Queue(maxsize=)
 q = Queue()
 # maxsize=:表示队列大小最大值，当队列到达最大值时再插入则线程会挂起
 # 可不填，不填则理论无上限
 
 Create a queue object with a given maximum size.
 If maxsize is <= 0, the queue size is infinite.

压入队列、弹出队列

 q.put(item,block=True,timeout=None)
 # item：表示需要压入队列的对象
 # block=True：表示当队列满时是否阻塞，默认为True
 # timeout=None：表示若需要阻塞，则阻塞多长时间，秒为单位

 q.get(block=True,timeout=None)
 # 当使用该函数时将会从队列中弹出一个对象（FIFO原则），并且队列中的对象不再存在
 # block=：表示当队列中无对象时是否阻塞，默认为True
 # timeout=：表示阻塞时长

 q.get_nowait()
 # 该函数无参数
 # 从队列中弹出一个对象，且无需等待，若没有对象则报错

 Remove and return an item from the queue without blocking.
 Only get an item if one is immediately available. Otherwise
 raise the Empty exception.

空、满、size判断

 q.full()
 # 当队列为满时返回True，否则为False
 源码：        
 with self.mutex:
     return 0 < self.maxsize <= self._qsize()		

 q.qsize()
 # 返回队列的长度
 源码：
 with self.mutex:
     return self._qsize()

 q.empty()
 # 当队列为空时返回True，否则为False
 源码：
 with self.mutex:
     return not self._qsize()

join()和task_done()

 q.join()
 # 文档解释:
 '''Blocks until all items in the Queue have been gotten and processed.
 队列会阻塞直到队列里所有的items都被处理过后

 The count of unfinished tasks goes up whenever an item is added to the
 queue. The count goes down whenever a consumer thread calls task_done()
 to indicate the item was retrieved and all work on it is complete.
 当item被压入队列中时，未完成的任务数量会增加。一旦消费者线程调用task_done()方法时则表示item被取回，		
 并且这个被取回的item需要做的所有工作都做完了，同时未完成的数量会减少。		

 When the count of unfinished tasks drops to zero, join() unblocks.
 当未完成的任务少到0，join()函数会阻塞
 '''

 q.task_done()
 # 文档解释：
 '''Indicate that a formerly enqueued task is complete.
 表示之前压入队列的任务被完成

 Used by Queue consumer threads.  For each get() used to fetch a task,
 a subsequent call to task_done() tells the queue that the processing
 on the task is complete.
 这个函数应该被消费者线程调用。每一次调用get()去处理一个任务时候，立即调用task_done()去告诉队列这个任务已经处理完毕

 If a join() is currently blocking, it will resume when all items
 have been processed (meaning that a task_done() call was received
 for every item that had been put() into the queue).
 若join()阻塞猪线程了，当所有item被处理完了join函数会苏醒

 Raises a ValueError if called more times than there were items
 placed in the queue.
 当调用的task_done()次数多过队列的长度时会报值错误
 '''

学习线程

其实在之前的文章中我已经介绍过python线程的基本使用，不了解的请看Python3多线程基础

实战练习（代码复制可用）

生产者消费者模型

我第一次接触这个概念是在上OS的时候老师给我介绍的，所谓生产者消费者模型，就是在一个程序中会出现三个角色，也可以理解为两个。一个是生产者，一个是管道，一个是消费者。生产者负责生产对象、商品、item，管道负责存储生产者生产的东西，消费者则负责消费（处理）管道里的东西。

具体的运动流向就类似于生产线上的组装工人，生产者使用零件拼装出基础商品A，并将其放入管道（传送带），然后消费者（下一组的工人）从管道中取出基础商品A，然后对其进行最后的加工得到成品B。

这便是一个生产者消费者模型，在实际运用中可能会出现比这基础模型庞大好多倍的模型，但都离不开这最微小的生产者消费者模型。

import time
from queue import Queue
import threading

q = Queue()

def consumer():
    while True:  # 死循环表示消费者在理想状态下无线工作
        time.sleep(1)  # 假装一秒钟处理一件商品
        print(q.get())  # 从管道（队列）中取出商品


def producer():
    while True:
        time.sleep(1)  # 假装一秒钟生产一件商品
        q.put(time.time())  # 用时间戳来当做生产者生产的商品

if __name__ == '__main__':
    t_pro = threading.Thread(target=producer)
    t_con = threading.Thread(target=consumer)
    t_pro.start()
    t_con.start()

运行结果：

1582264349.2540727
1582264350.2546353
1582264351.2551694
1582264352.2555282
......

Process finished with exit code -1

在上面这个例子中，我们分别开了两条线程去执行两个不同的函数。同时使用time.sleep()函数去控制生产消费的速度，一遍我们能够看清。这里的生产和消费都是一秒钟，如果我们调整时间会出现什么状况呢？

import time
from queue import Queue
import threading

q = Queue()

def consumer():
    while True:  # 死循环表示消费者在理想状态下无线工作
        # time.sleep(1)  # 把睡眠关闭 表示消费者的消费速度大于生产者生产速度
        print(q.get())  # 从管道（队列）中取出商品


def producer():
    while True:
        time.sleep(1)  # 假装一秒钟生产一件商品
        q.put(time.time())  # 用时间戳来当做生产者生产的商品

if __name__ == '__main__':
    t_pro = threading.Thread(target=producer)
    t_con = threading.Thread(target=consumer)
    t_pro.start()
    t_con.start()

上面这里，我们把消费者的睡眠关闭了，表示消费者的消费速度大于生产者生产速度,这样的话其实消费的速度是由生产者决定的。而且我们从结果上可以看不出什么差别。

那么接下来我们就给q.get()添加timeout

import time
from queue import Queue
import threading

q = Queue()

def consumer():
    while True:  # 死循环表示消费者在理想状态下无线工作
        # time.sleep(1)  # 把睡眠关闭 表示消费者的消费速度大于生产者生产速度
        print(q.get(timeout=0.5))  # 给get()添加了timeout 0.5秒


def producer():
    while True:
        time.sleep(1)  # 假装一秒钟生产一件商品
        q.put(time.time())  # 用时间戳来当做生产者生产的商品


if __name__ == '__main__':
    t_pro = threading.Thread(target=producer)
    t_con = threading.Thread(target=consumer)
    t_pro.start()
    t_con.start()

运行结果：

Exception in thread Thread-2:
Traceback (most recent call last):
  File "d:\my_ide\python3\Lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "d:\my_ide\python3\Lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "D:/My_IDE/PyCharm/Project/python_basic/TEST02/test_consumer_produser.py", line 11, in consumer
    print(q.get(timeout=0.5))  # 从管道（队列）中取出商品
  File "d:\my_ide\python3\Lib\queue.py", line 178, in get
    raise Empty
_queue.Empty

显然，当timeout到了还是没有item取出，则get就会报错了，所以若使用timeout的时候一定要记得处理异常。

那么我们把q.get(timeout=0.5)改成q.get_nowait()看看会怎么样

import time
from queue import Queue
import threading

q = Queue()


def consumer():
    while True:  # 死循环表示消费者在理想状态下无线工作
        # time.sleep(1)  # 把睡眠关闭 表示消费者的消费速度大于生产者生产速度
        # print(q.get(timeout=0.5))  # 从管道（队列）中取出商品
        print(q.get_nowait())

def producer():
    while True:
        time.sleep(1)  # 假装一秒钟生产一件商品
        q.put(time.time())  # 用时间戳来当做生产者生产的商品


if __name__ == '__main__':
    t_pro = threading.Thread(target=producer)
    t_con = threading.Thread(target=consumer)
    t_pro.start()
    t_con.start()

运行结果：

Exception in thread Thread-2:
Traceback (most recent call last):
  File "d:\my_ide\python3\Lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "d:\my_ide\python3\Lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "D:/My_IDE/PyCharm/Project/python_basic/TEST02/test_consumer_produser.py", line 12, in consumer
    print(q.get_nowait())
  File "d:\my_ide\python3\Lib\queue.py", line 198, in get_nowait
    return self.get(block=False)
  File "d:\my_ide\python3\Lib\queue.py", line 167, in get
    raise Empty
_queue.Empty

其实我贴出来还是不那么直观，大家自己运行一下就可以明显感觉到，get()会等你一段时间，若这段时间你来了item，他就不会报错，若没来则报错，还是reasonable的。而get_nowait()就没那么讲道理了，不等你，有就没事，没有就报错。这跟女朋友的脾气是差不多的。

总结

经过队列和线程学习和生产者消费者结合使用，我们基本了解多线程和Queue结合使用了。有了这些前期准备工作，我们就可以学习一下在爬虫中如何使用多线程提升效率了！

Alian_W

发布了12 篇原创文章 · 获赞 0 · 访问量 606

私信关注