When reptiles development, we often encounter various problems BUG, here are some of my initial error and solutions summary.
In a later study, if you encounter other problems, I will be here for the update.
If you have anything to add, welcomed the comments section Comments ~~~
problem:
IP was blocked, or because access was blocked frequency is too high? ? ?
Solution one:
You can use a proxy IP.
problem:
After the correct use of XPath and no output? ? ?
Solution one:
XPath can extract the code is not annotated, you can use regular expressions.
problem:
Easy to be anti-climb Gaosi? ? ?
Solution one:
headers should be put in the User-Agent, and the Cookie can not without a belt.
Error:
UTF-8 can not handle byte? ? ?
Solution one:
In Cookie headers can be added to the normal output of HTML.
Error:
'Gbk' can not handle '\ xa0'? ? ?
Solution one:
with open('%s.html' % title, 'w', encoding='utf-8') as f:
f.write(rep)
problem:
The output is a byte type, json object can not display properly? ? ?
Solution one:
Using the json.loads
method can be.
problem:
url = 'https://tieba.baidu.com/f?kw=%E8%8B%B1%E9%9B%84%E8%81%94%E7%9B%9F&ie=utf-8&pn=0'
Copy the URL to the py file, but become a "garbage"? ? ?
Solution one:
Call the urllib.parse.unquote
URL-decoding can be.
problem:
URL address non-standard? ? ?
Solution one:
When analyzing URL, we generally start from the second page of analysis, rather than the first page.
problem:
Cookie do not want to carry their own account in the content? ? ?
Solution one:
Use your browser's incognito window functionality into the web page can then take Cookie.
Error:
Solution one:
Error:
Solution one:
To be continued Oh ~ ~ ~ ~
For my beloved girl ~