Reptile Traditional Chinese garbled solve the garbage problem

Work needs to crawl corresponding Taobao seller account, for simplicity directly with regular match the desired treasurer name. ps: older project, with python2.7

Encountered three problems:

1. Chinese distortion, starts directly Response.encoding = 'utf-8', Chinese results are garbled. Check the information can be seen by Response.apparent_encoding return to the page encoding formats GB2312

2. Traditional Chinese garbled simply no problem, but the traditional characters is garbled, and Response.apparent_encoding = 'GB2312', then check the information, see the Web page source code directly in the browser, find the <meta charset = "gbk">, change gbk problem solving

3. The regular characters not match, because Response.text is unicode format, the need to transfer to a utf-8 supported python

 

 

Guess you like

Origin www.cnblogs.com/yeteng/p/10954100.html