1. Test Learning
(2) single-threaded:
from time import sleep import time def request(url): print('正在请求:',url) sleep(2) print('下载成功:', url) urls = ['www.baidu.com','www.sogou.com','www.goubanjia.com'] start = time.time() for url in urls: request(url) print(time.time()-start)
Test results: more than 6 seconds
Is requesting: www.baidu.com download success: www.baidu.com is requesting: www.sogou.com download success: www.sogou.com is requesting: www.goubanjia.com download success: www.goubanjia.com 6.001747369766235
(2) open thread pool: The test results are more than 2 seconds
from time import sleep import time from multiprocessing.dummy import Pool def request(url): print('正在请求:',url) sleep(2) print('下载成功:', url) urls = ['www.baidu.com','www.sogou.com','www.goubanjia.com'] start = time.time() pool=Pool(3) pool.map(request,urls) print(time.time()-start)
Test Results:
Is requesting: www.baidu.com being requested: www.sogou.com is requesting: www.goubanjia.com download success: www.goubanjia.com download success: www.sogou.com download success: www.baidu.com 2.034695625305176
(3) whether the blind can use multi-threaded, multi-process in the program?
Recommended: single-threaded asynchronous coroutine + (highest efficiency, with not a lot of people, a lot of crawling data will be used)
Look below
Coroutine (go and python unique concept) ,, coroutine does not occupy a very high memory
Leadership cares about is crawling out of the data.
The main request or learning modules. The following concepts learning these concepts, a moment will be reflected in the code
event_loop: event loop, the equivalent of an infinite loop, we can put some special function registers (placed) on the event loop, when certain conditions are met, the function will be executed cyclically. Procedure is performed in the set sequence from the beginning to the end,
the number of runs is set exactly. When writing asynchronous program, which is bound to run time-consuming part of the process is relatively long, we need to let the current control program, let it run in the background, so that another part of the program up and running first. When the program is running behind completion,
but also the need for timely notification main task has been completed can the next step, but the time required for this process is uncertain, the main program needs continuous monitoring state, upon receipt of a message task completion , we begin the next step. It is this continuous loop monitor. coroutine: Chinese translation is called coroutines, refers to the process on behalf of the Association object type, we can register coroutine object to the event loop in Python often it will be calling the cycle events. We can use async keyword to define a method that will not be executed immediately when you call, but returns a coroutine object. task: task, which is further encapsulated to coroutine object contains the status of each task. future: on behalf of the future tasks to perform or not to perform, in fact, and the task is no essential difference. In addition, we also need to know the async / the await keyword, which is from Python 3.6 only appeared, especially for defining the coroutine. Wherein, the async define a coroutine, the await for blocking method pending execution.