aiohttp
Brief introduction
aiohttp
Threaded concurrent IO operations may be implemented, instead of by his non-asynchronous request module to send the request, the request ua, headers, and parameters can be added, add the following method:
Environment Installation
pip install aiohttp
aiohttp使用
1. initiate a request
async def fetch(): async with aiohttp.ClientSession() as session: async with session.get('https://www.baidu.com') as resposne: print(await resposne.text()) loop = asyncio.get_event_loop() tasks = [fetch(),] loop.run_until_complete(asyncio.wait(tasks))
2. The method of adding the request parameters:
params = {'key': 'value', 'page': 10} async def fetch(): async with aiohttp.ClientSession() as session: async with session.get('https://www.baidu.com/s',params=params) as resposne: print(await resposne.url) loop = asyncio.get_event_loop() tasks = [fetch(),] loop.run_until_complete(asyncio.wait(tasks))
3.UA camouflage Add method:
url = 'http://httpbin.org/user-agent' headers = {'User-Agent': 'test_user_agent'} async def fetch(): async with aiohttp.ClientSession() as session: async with session.get(url,headers=headers) as resposne: print(await resposne.text()) loop = asyncio.get_event_loop() tasks = [fetch(),] loop.run_until_complete(asyncio.wait(tasks))
4. The method of custom cookies:
url = 'http://httpbin.org/cookies' cookies = {'cookies_name': 'test_cookies'} async def fetch(): async with aiohttp.ClientSession() as session: async with session.get(url,cookies=cookies) as resposne: print(await resposne.text()) loop = asyncio.get_event_loop() tasks = [fetch(),] loop.run_until_complete(asyncio.wait(tasks))
5.post request parameters
url = 'http://httpbin.org' payload = {'username': 'zhang', 'password': '123456'} async def fetch(): async with aiohttp.ClientSession() as session: async with session.post(url, data=payload) as resposne: print(await resposne.text()) loop = asyncio.get_event_loop() tasks = [fetch(), ] loop.run_until_complete(asyncio.wait(tasks))
6. Set proxy
url = "http://python.org" async def fetch(): async with aiohttp.ClientSession() as session: async with session.get(url, proxy="http://some.proxy.com") as resposne: print(resposne.status) loop = asyncio.get_event_loop() tasks = [fetch(), ] loop.run_until_complete(asyncio.wait(tasks))
Asynchronous IO processing
# Installation environment: the install aiohttp PIP # using the module in a ClientSession Import Requests Import ASYNCIO Import Time Import aiohttp Start the time.time = () URLs = [ 'http://127.0.0.1:5000/tiger','http:/ /127.0.0.1:5000/jay','http://127.0.0.1:5000/tom ', ' http://127.0.0.1:5000/tiger ',' http://127.0.0.1:5000/jay ',' http://127.0.0.1:5000/tom ', ' http://127.0.0.1:5000/tiger ',' http://127.0.0.1:5000/jay ',' HTTP: //127.0 .0.1: 5000 / Tom ', ' http://127.0.0.1:5000/tiger ',' http://127.0.0.1:5000/jay ',' http://127.0.0.1:5000/tom ', ] the async DEF get_page (url): the async with aiohttp.ClientSession () AS the session: #get()、post(): #headers,params/data,proxy='http://ip:port' the async with the await Session.get (URL) AS Response: #text () returns the response data string #read () returns the form of binary response data #JSON () returns the object is json # Note: before acquiring the response data operation must be performed manually using await suspend page_text = await response.text () Print (page_text) Tasks = [] for URLs in URL: C = the get_page (URL) Task = asyncio.ensure_future (C) tasks.append (Task) Loop = asyncio.get_event_loop () loop.run_until_complete (asyncio.wait (Tasks)) End = the time.time () Print ( 'total time: ', end-start)
# Use aiohttp alternative requests module Import Time Import ASYNCIO Import aiohttp the async DEF get_page (url): the async with aiohttp.ClientSession () AS the session: # will be blocked as long as there is time-consuming, you have to use await pending operations carried out async with await Session.get (URL = URL) AS response: page_text the await response.text = () # binary Read () / JSON () Print ( 'response data', page_text) Start the time.time = () URLs = [ 'HTTP: //127.0.0.1:5000/tiger ', ' http://127.0.0.1:5000/jay ', ' http://127.0.0.1:5000/tom ', ] Loop = asyncio.get_event_loop () Tasks = [ ] for url in urls: Cone = get_page (url) Task = asyncio.ensure_future (Cone) tasks.append(task) loop.run_until_complete(asyncio.wait(tasks)) print('总耗时: ', time.time()-start)
Here we will request from the library into a aiohttp requests, a request by a method of the class get ClientSession aiohttp of (), the following results:
Hello tom Hello jay Hello tiger Hello tiger Hello jay Hello tiger Hello tom Hello jay Hello jay Hello tom Hello tom Hello tiger 总耗时: 2.037203073501587
Success! We found that the request by the time-consuming six seconds into a two seconds, time-consuming directly into the original 1/3.
We use the code inside await, followed by a get () method, in the implementation of the five co-process, if you encounter await, it will suspend the current coroutine instead to perform other coroutine until other coroutine or suspend or finished, and then execute the next coroutine.
It starts running, the cycle time will run the first task, for the first task, when executed to first await follow the get () method, which is pending, but the get () method first step the execution is non-blocking, suspended immediately after being awakened, so they immediately entered the implementation, ClientSession created objects, then met with the second await, called session.get () request method, and then was suspended, because requests require time-consuming for a long time, it has not been awakened, a good first task is suspended, next, how to do it? Event loop will look not currently suspended coroutine continue, so he turned to the implementation of the second task, the process is the same operation until after the execution of the session.get fifth task () method, all task have been hung up. All task have been in a suspended state, it is supposed to? I had to wait. After three seconds, several requests almost simultaneously have a response, and then wake up several task is also performed, the output result of the request, and finally takes 3 seconds!
how about it? This is the convenience offered by the asynchronous operation, when it comes to blocking operation, the task is suspended, then program to perform other tasks, rather than innocently waiting for, so you can make full use of CPU time, without having to waste time waiting on IO.
Visible, after the use of asynchronous co-routines, we can achieve almost the same time hundreds of times times of network requests, this use in reptiles, the speed increase can be described as very impressive.
How to achieve data analysis - binding callback mechanism tasks (complete coroutine process)
Time Import Import ASYNCIO Import aiohttp # callback: mainly used to parse the response data DEF the callback (Task): Print ( 'the callback This IS') # fetch response data page_text = task.result () Print ( "you can then callback function for data parsing ") the async DEF get_page (url): the async with aiohttp.ClientSession () AS the session: # will be blocked as long as there is time-consuming, you have to use await pending operations carried out async with await session.get (url URL =) AS response: page_text the await response.text = () # binary Read () / JSON () Print ( 'response data', page_text) return page_text Start the time.time = () URLs = [ 'HTTP: //127.0 .0.1: 5000 / Tiger ', ' http://127.0.0.1:5000/jay ', ' http://127.0.0.1:5000/tom ', ]
# The first step in generating the object event loop Loop = asyncio.get_event_loop ()# list of tasks Tasks = [] for url in urls: Cone = get_page (url) # The second step will coroutine function objects into the task task = asyncio. ensure_future (Cone) # object is bound to the task callback function to parse the response data task.add_done_callback (callback) # the third step is to add all of the tasks to the task list tasks.append (task) # step four running event loop objects , asyncio.wait () to multi-task to run automatically in a loop loop.run_until_complete (asyncio.wait (tasks)) Print ( 'total time:', time.time () - start