Python training three

# After-school summary 
# "crawlers" to start climbing the mountain, do not understand ah 
# a reptile principle 
# 1. What is the Internet? 
#     Refers to a bunch of network equipment, the computer Internet a platform to call it together for the Internet. 
# ? 2. The purpose of the establishment of the Internet 
#     purpose is to establish the Internet transfer and share data data 
# 3 .. the whole process of the Internet: 
#      a normal user: 
#      Open a browser> to the target site to send a request> a fetch response data -> renderer in the browser 
#      - crawlers: 
#        simulate a browser -> to a target site a transmission request> fetch response data of a> a data extract valuable> persisted to data 
# 4. the whole process of the Internet: 
#        a normal user: 
#        open the browser a> to the target site to send the request of a> fetch response data a> render to browser 
#        - crawlers: 
#        simulate browser a> to the target site to send a request a> a fetch response data> data to extract a valuable> persisted to data 
#5. What is the browser sends a request? 
#        HTTP protocol requests. 
#        - Client: 
#        Browser is a software -> Client IP and port 
#        a server 
#        HTTPS:.. / / JD the WWW COM / 
#        www.jd. COM (Jingdong domain name) -> DNS parsing -> Jingdong IP and port of the server 
#        client ip and port - - -> IP and port to send the request to the server can establish a link to obtain the corresponding data. 
# 6 crawler whole process 
#        transmission request 
#        fetch response data (as long as the transmission request to the server, the request will be returned by the response data) - parses and extracts data (requires parsing library: re, BeautifulSoup4, Xpath ...) - Save local 
#        (file processing, database, MongoDB repository) 
# Import Requests 
# Response = requests.get (URL = 'HTTP: //www.baidu.com/') 
# response.encoding = 'UTF-. 8'
# print(response.text)
# with open('baidu.html', 'w', encoding='utf-8')as f:
#       f.write(response.text)
# import requests
# response = requests.get('https://video.pearvideo.com/head/20190625/cont-1570107-14056273.mp4')
# print(response.content)
# with open('视频.mp4', 'wb')as f:
#       f.write(response.content)
import requests
import re
response = requests.get('https://www.pearvideo.com/')
print(response.text)
res_list=re.findall('<a href="video_(.*?)"',response.text,re.S)
print(res_list)
for v_id in res_list:
      detail_url='https://www.pearvideo.com/video'+v_id
      print(detail_url)

 

Guess you like

Origin www.cnblogs.com/ys88/p/11094735.html