Python converts html into garbled characters, how to solve the phenomenon of garbled characters written by python into html files (detailed picture and text)

67aeff0a14910c5ab350c91377aac32f.png

python writes html file Chinese garbled problem

Use the open function to write the html crawled by the crawler into a file, sometimes it will not be garbled in the console, but the Chinese in the html written to the file is garbled

case analysis

Look at the following piece of code:

# 爬虫未使用cookiefrom urllib import requestif __name__ == '__main__': url = "http://www.renren.com/967487029/profile" rsp = request.urlopen(url) html = rsp.read().decode() with open("rsp.html","w")as f: # 将爬取的页面 print(html) f.write(html)

There seems to be no problem, and there will be no Chinese garbled characters in the html output from the console, but in the created html file

123e4403546eb1102e4e44fab68914e6.png

solution

Use a parameter of the open method, named encoding="", and add encoding="utf-8"

# 爬虫未使用cookiefrom urllib import requestif __name__ == '__main__': url = "http://www.renren.com/967487029/profile" rsp = request.urlopen(url) html = rsp.read().decode() with open("rsp.html","w",encoding="utf-8")as f: # 将爬取的页面 print(html) f.write(html)

operation result

ab1e0a5bb70999954b048e8ab5f551ad.png

Thanks for reading, and I hope you all benefit.

This article is reproduced from: https://blog.csdn.net/qq_40147863/article/details/81746445

Recommended tutorial: "python tutorial"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324187357&siteId=291194637