Python threads have a classic problem, GIL (Global Control Lock, that is, Python can only have one thread running at a certain time), so how does multithreading come about? Implementation?
Answer: In the processing of I/O-intensive tasks, when waiting for a request, the I/O function in Python will automatically release the GIL and run a thread, so I/O time will be shortened, which is a concurrent operation. Implementation methods include concurrent.future module (futures.ThreadPoolExecutor class) and threading module.
Can Python achieve true parallelism? What the hell is multi-process? Will multiple processes be faster?
Answer: This is fine. You can select the number of CPUs to perform parallel tasks, so that you can perform multi-process tasks. Parallel operation of multi-processes is not necessarily faster than multi-threading. The application fields of the two are different. Multi-processes are suitable for CPU-intensive (a large number of numerical calculations, all in the running memory) operation, and multi-threading is suitable for I/O-intensive (and The disk has interactive behavior) operation. Implementation methods include threading module, mutiprocessing module, futures.ProcessPoolExecutor class.
Let's implement a multithreading first, it seems simpler.
In [65]: import requests
In [66]: import tqdm
In [67]: import os
In [68]: import time
In [69]: import sys
In [70]: POP_CC= ('CC,IN,US,ID,BP').split()
In [71]: BASE_URL = 'http://flupy.org/data/flags'
In [72]: DEST_DIR = 'downloads/'
In [73]: def save_flag(img,filename):
...: path = os.path.join(DEST_DIR,filename)
...: with open(path,'wb') as fp:
...: fp.write(img)
...:
In [74]: def get_flag(cc):
...: url = '{}/{cc}/{cc}.gif'.format(BASE_URL,cc=cc.lower())
...: resp = requests.get(url)
...: return resp.content
...:
In [75]: def show(text):
...: print(text,end='')
...: sys.stdout.flush()
...:
In [76]: def download_one(cc): # 技巧,所有基操都放在一个里
...: image = get_flag(cc)
...: show(cc)
...: save_flag(image,cc.lower()+'.gif')
...: return cc
...:
In [77]: MAX_WORKERS = 20 # 20个线程
In [78]: def download_many(cc_list):
...: workers = min(MAX_WORKERS,len(cc_list)) # 任务少的话开多没用
...: with futures.ThreadPoolExecutor(workers) as executor:
...: res = executor.map(download_one,tqdm.tqdm(sorted(cc_list))) # 其实经历了future实例的过程,可以细究,不过单纯实现不需要知道,tqdm显示进度条
...: return len(list(res))
...:
In [79]: def main(download_many):
...: t0 = time.time()
...: count = download_many(POP_CC)
...: elapsed = time.time()-t0
...: msg = '\n{} flags downloaded in {:.2f}s'
...: print(msg.format(count,elapsed))
...:
In [80]: if __name__ == 'main':
...: main(download_many)