Demand: If you want to achieve this function: the user enters the name of favorite movies, programs can be in the movie Paradise https://www.ygdy8.com crawling movie corresponding to the download link, download link and print out
Challenges: acquiring magnetic links include Chinese, after printing out garbage
Workaround: Manually specify the encoding:
if res.encoding == 'ISO-8859-1': encodings = requests.utils.get_encodings_from_content(res.text) if encodings: encoding = encodings[0] else: encoding = res.apparent_encoding else: encoding = res.encoding encode_content = res.content.decode(encoding, 'replace').encode('utf-8', 'replace')
# Want to achieve this function: the user enters the name of favorite movies, programs can be in the movie Paradise https://www.ygdy8.com crawling movie corresponding to the download link, download link and print out the Import Requests from BS4 Import the BeautifulSoup from the urllib.request Import pathname2url # of anti-climb avoidance mechanism, disguised browser request header headers = { ' the User-- Agent ' : ' the Mozilla / 5.0 (the Macintosh; the Intel the Mac the OS X-10_14_3) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 78.0.3904.108 Safari / 537.36 OPR / 65.0.3467.78 (Baidu Edition) ' } # get film magnetic link DEF getMovieDownloadLink (filmlink): RES = requests.get (filmlink, headers = headers) IF res.status_code == 200 : # Content Chinese garbled approach after the request: # when the response code is 'ISO-8859-1', we should first look for the encoding response header set; if this coding does not exist, to see the return of Html encoding the header set IF res.encoding == ' the ISO-8859-1 ' : Encodings = requests.utils.get_encodings_from_content (res.text) IF Encodings: encoding = Encodings [0] the else : encoding = res.apparent_encoding the else : encoding = res.encoding encode_content= res.content.decode(encoding, 'replace').encode('utf-8', 'replace') soup = BeautifulSoup(encode_content, 'html.parser') Zoom = soup.select_one('#Zoom') fileurl = Zoom.find('table').find('a').text with open('./17-电影天堂磁力.txt','a', newline='') AS File: a file.write (fileurl + ' \ n- ' ) the else : Print ( ' Movie links: {} request failed! ' .Format (filmlink)) DEF main (): dyurl = ' HTTPS: //www.ygdy8 .com ' # movie = the iNPUT (' Please enter the name of the movie: ') movie = ' Maleficent ' movie = movie.encode ( ' GBK ' ) url = ' http://s.ygdy8.com/plus/s0. PHP? =. 1 the typeid & keyword = {0} ' .format(pathname2url(movie)) res = requests.get(url, headers=headers) if res.status_code == 200: htmltext = res.text soup = BeautifulSoup(htmltext, 'html.parser') co_content8 = soup.find('div', class_='co_content8') tables = co_content8.find('ul').find_all('table') if len(tables) <= 0: print('No resources found, the search can be site 0} { ' .format (dyurl)) the else : for Table in Tables: filmlink = dyurl table.find + ( ' A ' ) [ ' the href ' ] getMovieDownloadLink (filmlink) the else : Print ( ' request failed! ' ) main ()
result:
reference:
https://blog.csdn.net/guoxinian/article/details/82978067
http://blog.csdn.net/a491057947/article/details/47292923
http://docs.python-requests.org/en/latest/user/quickstart/#response-content