Python advanced-multi-threading and multi-process (continuous update)

Python threads have a classic problem, GIL (Global Control Lock, that is, Python can only have one thread running at a certain time), so how does multithreading come about? Implementation?

Answer: In the processing of I/O-intensive tasks, when waiting for a request, the I/O function in Python will automatically release the GIL and run a thread, so I/O time will be shortened, which is a concurrent operation. Implementation methods include concurrent.future module (futures.ThreadPoolExecutor class) and threading module.

Can Python achieve true parallelism? What the hell is multi-process? Will multiple processes be faster?

Answer: This is fine. You can select the number of CPUs to perform parallel tasks, so that you can perform multi-process tasks. Parallel operation of multi-processes is not necessarily faster than multi-threading. The application fields of the two are different. Multi-processes are suitable for CPU-intensive (a large number of numerical calculations, all in the running memory) operation, and multi-threading is suitable for I/O-intensive (and The disk has interactive behavior) operation. Implementation methods include threading module, mutiprocessing module, futures.ProcessPoolExecutor class.

 Let's implement a multithreading first, it seems simpler.

In [65]: import requests

In [66]: import tqdm

In [67]: import os

In [68]: import time

In [69]: import sys

In [70]: POP_CC= ('CC,IN,US,ID,BP').split()

In [71]: BASE_URL = 'http://flupy.org/data/flags'

In [72]: DEST_DIR = 'downloads/'

In [73]: def save_flag(img,filename):
    ...:     path = os.path.join(DEST_DIR,filename)
    ...:     with open(path,'wb') as fp:
    ...:         fp.write(img)
    ...:

In [74]: def get_flag(cc):
    ...:     url = '{}/{cc}/{cc}.gif'.format(BASE_URL,cc=cc.lower())
    ...:     resp = requests.get(url)
    ...:     return resp.content
    ...:

In [75]: def show(text):
    ...:     print(text,end='')
    ...:     sys.stdout.flush()
    ...:

In [76]: def download_one(cc): # 技巧,所有基操都放在一个里
    ...:     image = get_flag(cc)
    ...:     show(cc)
    ...:     save_flag(image,cc.lower()+'.gif')
    ...:     return cc
    ...:

In [77]: MAX_WORKERS = 20 # 20个线程

In [78]: def download_many(cc_list):
    ...:     workers = min(MAX_WORKERS,len(cc_list)) # 任务少的话开多没用
    ...:     with futures.ThreadPoolExecutor(workers) as executor:
    ...:         res = executor.map(download_one,tqdm.tqdm(sorted(cc_list))) # 其实经历了future实例的过程,可以细究,不过单纯实现不需要知道,tqdm显示进度条
    ...:     return len(list(res))
    ...:

In [79]: def main(download_many):
    ...:     t0 = time.time()
    ...:     count = download_many(POP_CC)
    ...:     elapsed = time.time()-t0
    ...:     msg = '\n{} flags downloaded in {:.2f}s'
    ...:     print(msg.format(count,elapsed))
    ...:

In [80]: if __name__ == 'main':
    ...:     main(download_many)

 

Guess you like

Origin blog.csdn.net/weixin_40539952/article/details/107488111