Shiqianfeng crawling JavaScript in video
. 1 Import Requests 2 from The urllib.parse Import quote . 3 from lxml Import etree . 4 '' ' . 5 the URL . 6 http://video.mobiletrain.org/course/index/courseId/479 . 7 request method . 8 the GET . 9 request header 10 User- - Agent: the Mozilla / 5.0 (the Windows NT 10.0; Win64; x64-) AppleWebKit / 537.36 (KHTML, like the Gecko) the Chrome / 70.0.3538.67 Safari / 537.36 . 11 '' ' 12 is # analog transmission request acquirer 13 is response = requests.get ( 14 = URL 'http://video.mobiletrain.org/course/index/courseId/479 ' , 15 headers = { 16 ' the User-- Agent ' : ' the Mozilla / 5.0 (the Windows NT 10.0; Win64; x64-) AppleWebKit / 537.36 (KHTML, like the Gecko) the Chrome / 70.0.3538.67 Safari / 537.36 ' . 17 } 18 is ) . 19 HTML = response.text 20 is # acquired video address page 21 is eroot = etree.HTML (HTML) 22 is hrefs = eroot.xpath ( " // Li [ class = @ 'J-clearfix-URL List'] / A / @ Data-URL " ) 23 is for the href inhrefs: 24 Print (href) 25 # provided file name 26 is start_index = href.find ( ' : ' ) + 1'd 27 end_index = -4 28 filename = href [start_index: end_index] 29 # taken from Chinese href in 30 START_URL href = .find ( " one thousand " ) 31 is URI = the href [START_URL: end_index] 32 # configured to access the real address of the video 33 is START_URI = ' http://7xtcwd.com1.z0.glb.clouddn.com/ ' 34 is # to be Chinese coded 35 = end_uri quote (URI) 36 the src = START_URI end_uri + + " .mp4 " 37 [ 38 is with Open (filename + ' .mp4 ' , ' WB ' ) AS F: 39 # using the request to download the file 40 video_response = requests.get ( 41 is URL = the src, 42 is Stream = True 43 is ) 44 is Print ( " downloading: " , the src) 45 # each 512 bytes downloaded to a callback 46 is for chunk in video_response.iter_content(chunk_size=512): 47 f.write(chunk)
Reproduced in: https: //www.cnblogs.com/chaunceyji/p/10995266.html