使用协程进行分块下载的实例

看协程asyncio有点云里雾里,原理是明白了,但总要有点实际应用吧,协程对于IO密集有着天然的优势,aiohttp还没看,谨以此例先体验下协程的实际应用,同时了解一下分块下载的方法。

1.着先获取待下载文件的大小 size

下载文件的大小通常都在headers里的"Content-Length",所以先读取一下header获得size:

resp = requests.head(url)
size = int(resp.headers["Content-Length"])

2.根据设置的n把待下载的文件分块,并记录分块的边界

通常分块并不能等分,所以最后一块就大一些:

spos = []
fpos = []
persize = size//n
for i in range(0, size, persize):
    spos.append(i)
    fpos.append(i + persize - 1)
fpos[-1] = size

3.下载文件指定的区间

通过requests.get()方法的header["Range"]指定下载文件的区间,比如需下载10-20字节段的文件:

header["Range"] = "bytes=10-20"
4.定义一个协程的函数

由于一般函数不能被await所修饰,必须要用loop.run_in_executor封装一下,但是loop.run_in_executor传参数比较坑,不支持**kwagrs,所以需要把requests.get(url)再封装一下:

get = lambda:requests.get(url,headers=headers)
resp = await loop.run_in_executor(None, get)

同时,分块下载后写入文件时需要找到块的起始位置,这就需要用到f.seek(offset,where)了。

完整的代码如下:

import asyncio
import requests
import time
url = "http://xia2.kekenet.com/Sound/2018/06/bbcdqmd175_3937944FiP.mp3"


async def download(spos, fpos, f, i):
	""""""
	headers = {}
	headers['Range'] = "bytes=%d-%d"%(spos, fpos)
	# print("bytes=%d-%d"%(spos, fpos))
	try:
		get = lambda:requests.get(url,headers=headers)
		print('part of %d is ready!'%i)
		resp = await loop.run_in_executor(None, get)
		f.seek(spos,0)
		f.write(resp.content)
		print('part of %d is completed!'%i)
	except Exception as e:
		print("download file error:",e)
if __name__ == '__main__':
	n = 10

	resp = requests.head(url)	
	size = int(resp.headers["Content-Length"])
	spos = []
	fpos = []
	persize = size//n
	for i in range(0, size, persize):
		spos.append(i)
		fpos.append(i + persize - 1)
	fpos[-1] = size
		
	print(spos)
	print(fpos)

	f = open("D:\\kekenet.mp3",'wb')
	f.close()
	f = open("D:\\kekenet.mp3",'rb+')

	start_time = time.time()

	loop = asyncio.get_event_loop()		
	loop.run_until_complete(asyncio.gather(*[download(spos[i], fpos[i], f, i+1) for i in range(n)]))
	
	finish_time = time.time()

	f.close()
	
	print('average speed is %0.2f KB/s'%(size/1000.0*(finish_time-start_time)))	
打印结果如下:

由打印结果可知,协程的开始并不是按顺序的,完成也不一定按开始的顺序的,这也是它效率高的原因吧。

[0, 207553, 415106, 622659, 830212, 1037765, 1245318, 1452871, 1660424, 1867977, 2075530]
[207552, 415105, 622658, 830211, 1037764, 1245317, 1452870, 1660423, 1867976, 2075529, 2283082]
part of 4 is ready!
part of 10 is ready!
part of 5 is ready!
part of 2 is ready!
part of 6 is ready!
part of 1 is ready!
part of 7 is ready!
part of 3 is ready!
part of 8 is ready!
part of 9 is ready!
part of 4 is completed!
part of 5 is completed!
part of 8 is completed!
part of 3 is completed!
part of 2 is completed!
part of 9 is completed!
part of 7 is completed!
part of 1 is completed!
part of 10 is completed!
part of 6 is completed!
average speed is 6390.67 KB/s


猜你喜欢

转载自blog.csdn.net/weixin_36277945/article/details/80646084
今日推荐