Question: I'll give you 10 pictures of the url, you help me to put 10 pictures download.
Before you
urls = ['http://www.xx1.png','http://www.xx1.png','http://www.xx10.png',] for url in urls: response = requests.get(url) with open(url+'.png','wb') as f: f.write(response.content)
This form can achieve the above tasks, but the efficiency is very low, if every time a url of io 2s, so will have to spend 6s, this is not efficient
Here are several schemes can achieve high performance
1. Multithreading:
Cons: the thread is not high utilization, access to a url for each thread after it idle.
""" 多线程 """ import requests import threading urls = [ 'http://www.baidu.com/', 'https://www.cnblogs.com/', 'https://www.cnblogs.com/news/', 'https://cn.bing.com/', 'https://stackoverflow.com/', ] def task(url): response = requests.get(url) print(response) for url in urls: t = threading.Thread(target=task,args=(url,)) t.start()
2. coroutine:
Since io encountered when coroutine can be switched (internal switch), efficiency
"" " Coroutine + IO switching pip3 install gevent internal gevent greenlet call (to achieve a coroutine). " "" From gevent Import Monkey; monkey.patch_all () Import gevent Import Requests DEF FUNC (URL): Response = requests.get (URL) Print (Response) URLs = [ 'http://www.baidu.com/', 'https://www.cnblogs.com/', 'https://www.cnblogs.com/news/' , 'https://cn.bing.com/', 'https://stackoverflow.com/', ] spawn_list = [] for URLs in URL: # here no transmission request, but are placed in a list. spawn_list.append (gevent.spawn (FUNC, URL)) # initiation request; io encountered on switching gevent.
3. Based on the non-blocking asynchronous event loop module (module utilizes the interior of the IO multiplexing.) Twisted
""" 基于事件循环的异步非阻塞模块:Twisted """ from twisted.web.client import getPage, defer from twisted.internet import reactor def stop_loop(arg): reactor.stop() def get_response(contents): print(contents) deferred_list = [] url_list = [ 'http://www.baidu.com/', 'https://www.cnblogs.com/', 'https://www.cnblogs.com/news/', 'https://cn.bing.com/', 'https://stackoverflow.com/', ] for url in url_list: deferred = getPage(bytes(url, encoding='utf8')) deferred.addCallback(get_response) deferred_list.append(deferred) dlist = defer.DeferredList(deferred_list) dlist.addBoth(stop_loop) reactor.run()
Question: asynchronous non-blocking internal module based event loop is how to achieve? Use of IO multiplexing.
- is to write the socket on the nature reptile
- the benefits of non-blocking? When requesting no longer wait.
Internal implementation mechanism: IO multiplexing.
import socket import select class ChunSheng(object): def __init__(self): self.socket_list = [] self.conn_list = [] self.conn_func_dict = {} # url_func[0] url; url_func--(url,func_url) def add_request(self, url_func): conn = socket.socket() conn.setblocking(False) try: conn.connect((url_func[0], 80)) except BlockingIOError as e: pass self.conn_func_dict[conn] = url_func[1] self.socket_list.append(conn) self.conn_list.append(conn) def run(self): """ 检测self.socket_list中的5个socket对象是否连接成功 :return: """ True the while: # select.select # first argument: wherein socket for detecting whether a response has been acquired content # second argument: means for detecting whether wherein the socket has been connected successfully # return a first value r: in particular that a socket obtained results # return a second value w: a socket connector that is particularly successful R & lt, W, E = select.select (self.socket_list, self.conn_list, [], 0.05) for our sock in w: # [ SOCKET1, SOCKET2] sock.send (b'GET / HTTP1.1 \ R & lt \ NHOst: xxxx.com \ R & lt \ n-\ R & lt \ n-') self.conn_list.remove (our sock) for R & lt our sock in: Data = our sock. the recv (8096) FUNC = self.conn_func_dict [our sock] FUNC (Data) sock.close () self.socket_list.remove (our sock) IF Not self.socket_list: BREAK
What is asynchronous?
Starting is a callback, automatically execute a function when a task is completed.
Our contacts:
- reptiles: callback function is performed automatically after the download is complete
- ajax: sending a request to the background, execute the callback function after the request is complete.
- What is non-blocking?
In fact, just do not wait, socket how to set setblocing (False) then the socket is no longer blocked.
- IO multiplexing role?
Listening socket state:
- whether the connection is successful
- whether to obtain results
achieved IO multiplexing:
- the SELECT, only listening 1024 socket; internal loops through all of the socket to detect;
- poll, no limit to the number, internally All cycles to detect socket;
- the epoll, no limit to the number, the callback.
Schemes 2 and 3 similar results, but the angle is not the same switch. Coroutine internal switching; and non-blocking asynchronous event loop based on an outsider module (in perspective God) for deployment;