并发编程会给我们的程序带来极大的性能提升，并发编程具有非常广泛的应用，比如服务器、网络爬虫、性能测试等。Python中并发编程的实现方式有：多线程、多进程以及协程。本文主要介绍多线程和多进程。

1. 几个重要概念

在python并发编程之前，我们需要明确和掌握几个重要的概念，并发与并行，同步和异步，阻塞与非阻塞。

1.1 并发执行和并行执行

并行（parallel）：指的是互不干扰的在同一时刻做多件事，对应Python中的就是多进程（multi-processing），可以利用多核处理器的优势，通常应用于 CPU heavy 的场景，比如计算密集型任务。
并发（concurrency）：指的是同时做某些事，但是强调同一时段做多件事，对应Python中就是多线程（multi-threading）或者协程（Coroutine），通常应用于 I/O 操作频繁的场景，比如发起网络请求。

1.2 同步调用和异步调用

同步调用和异步调用是提交任务的两种方式。

同步调用：提交任务，原地等待任务执行结束，拿到任务返回结果。再执行下一行代码，会导致任务串行执行。
异步调用：提交任务，不进行原地等待，直接执行下一行代码，任务并发执行。

1.3 阻塞状态和非阻塞状态

阻塞运行和非阻塞运行，是程序的运行状态。

阻塞：程序遇到IO操作时，进行原地等待，即程序处于阻塞态。
非阻塞：程序没有进行IO操作时，程序处于运行态，即就绪态。

1.4 进程池和线程池

进程池和线程池，是用于控制进程数或线程数的。

如果服务器开启的进程数或线程数，随并发的客户端数目单调递增，服务器就会承受巨大的压力，于是使用“池”的概念，对服务端开启的进程数或线程数加以控制。

进程池：用来存放进程的"池"
线程池：用来存放线程的"池"

当服务器收到客户端的请求时，从池子中拿出线程或者进程来处理，处理完，再把线程或者进程放入池子中。

2. 单线程与多线程性能比较

先写一个单线程发起网络请求的代码：

import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    for site in sites:
        download_one(site)

if __name__ == '__main__':
    sites = [
        'https://golang.google.cn/',
        'https://www.python.org/',
        'http://www.php.net/',
        'https://www.javascript.com/',
        'http://mqtt.org/',
        'https://www.mysql.com/',
        'https://www.java.com/zh_CN/',
        'https://developers.google.cn/protocol-buffers/'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

上面代码的输出：

Read 7181 from https://golang.google.cn/
Read 48634 from https://www.python.org/
Read 62050 from http://www.php.net/
Read 32850 from https://www.javascript.com/
Read 17336 from http://mqtt.org/
Read 31275 from https://www.mysql.com/
Read 10454 from https://www.java.com/zh_CN/
Read 34218 from https://developers.google.cn/protocol-buffers/
Download 8 sites in 11.896329030999999 seconds

可见请求这8个网站总共花费11.8秒多，再来看看多线程版本。

Python标准库为我们提供了threading和multiprocessing模块编写相应的异步多线程/多进程代码。从Python3.2开始，标准库为我们提供了concurrent.futures模块，它提供了ThreadPoolExecutor和ProcessPoolExecutor两个类。下面的代码使用ThreadPoolExecutor这个类实现多线程。

import concurrent.futures

import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    return {'url': url, 'content_length': len(resp.content)}

def download_all(sites_list):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_one, sites_list)

if __name__ == '__main__':
    sites = [
        'https://golang.google.cn/',
        'https://www.python.org/',
        'http://www.php.net/',
        'https://www.javascript.com/',
        'http://mqtt.org/',
        'https://www.mysql.com/',
        'https://www.java.com/zh_CN/',
        'https://developers.google.cn/protocol-buffers/'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

这段代码的输出：

Read 650 from http://www.php.net/
Read 7181 from https://golang.google.cn/
Read 10454 from https://www.java.com/zh_CN/
Read 48634 from https://www.python.org/
Read 32850 from https://www.javascript.com/
Read 17336 from http://mqtt.org/
Read 31275 from https://www.mysql.com/
Read 34218 from https://developers.google.cn/protocol-buffers/
Download 8 sites in 1.8238722280000002 seconds

明显多线程的程序比单线程循环请求快很多。多线程版本与单线程版本区别主要在：

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(download_one, sites_list)

这⾥我们创建了⼀个线程池，总共有5个线程可以分配使⽤。executer.map()与Python内置的map()函数类似，表示对sites_list中的每⼀个元素，并发地调⽤函数download_one()。

通常来讲，我们应该避免编写线程数量可以无限制增长的程序。创建大量线程让你服务器资源枯竭而崩溃，最好是通过使用预先初始化的线程池，设置同时运行线程的上限数量。

由于全局解释锁（GIL）的原因，Python 的线程被限制到同一时刻只允许一个线程执行。所以，Python的线程更适用于处理I/O和其他需要并发执行的阻塞操作（比如等待I/O、等待从数据库获取数据等等）。

如果是CPU密集型的任务，我们最好用ProcessPoolExecutor这个类。ProcessPoolExecutor的使用方法和ThreadPoolExecutor类似。如果上面的例子用ProcessPoolExecutor来实现，只需要将ThreadPoolExecutor换成ProcessPoolExecutor即可。使用ProcessPoolExecutor时，max_workers参数可以不指定，默认为CPU的核数。

3. submit方法实现多线程

通过executor.submit()方法，也可以达到多线程执行的效果，不过代码比较多。上述例⼦中download_all函数也可以写成下⾯的形式：

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        to_do = []
        for site in sites:
            future = executor.submit(download_one, site)
            to_do.append(future)

        for future in concurrent.futures.as_completed(to_do):
            future.result()

这⾥需要两个循环，第一个循环对每个网站调⽤ executor.submit()产生一个Future对象future并放入to_do中等待执⾏。

第二个循环，是对于执行完成的future通过result()方法获取结果。as_completed(fs)是针对给定的future迭代器fs，在其完成后，返回完成后的迭代器。

不过，这⾥要注意，future 列表中每个 future 完成的顺序，和它在列表中的顺序并不⼀定完全⼀致。到底哪个先完成、哪个后完成，取决于系统的调度和每个future的执⾏时间。

通常建议使用executor.map()方法，既简单又高效，而且返回执行结果的顺序，依然与传入参数的顺序保持一致

4. add_done_callback方法的妙用

Future对象也可以像协程一样，当它设置完成结果时，就可以立即进行回调别的函数。add_done_callback(fn)，则表示 Futures 完成后，会调⽤fn函数。

import concurrent.futures
import requests
import time

def download_one(url):
    resp = requests.get(url)
    return {'url': url, 'content_length': len(resp.content)}

def parse(res):
    res = res.result()
    print('Read {} from {}'.format(res['content_length'], res['url']))

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        for site in sites:
            executor.submit(download_one, site).add_done_callback(parse)

if __name__ == '__main__':
    sites = [
        'https://golang.google.cn/',
        'https://www.python.org/',
        'http://www.php.net/',
        'https://www.javascript.com/',
        'http://mqtt.org/',
        'https://www.mysql.com/',
        'https://www.java.com/zh_CN/',
        'https://developers.google.cn/protocol-buffers/'
    ]
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))