Python reptile summary - a common error, problems and solutions


When reptiles development, we often encounter various problems BUG, here are some of my initial error and solutions summary.
In a later study, if you encounter other problems, I will be here for the update.
If you have anything to add, welcomed the comments section Comments ~~~



problem:

IP was blocked, or because access was blocked frequency is too high? ? ?

Solution one:

You can use a proxy IP.


problem:

After the correct use of XPath and no output? ? ?

Solution one:

XPath can extract the code is not annotated, you can use regular expressions.


problem:

Easy to be anti-climb Gaosi? ? ?

Solution one:

headers should be put in the User-Agent, and the Cookie can not without a belt.


Error:

Here Insert Picture Description
UTF-8 can not handle byte? ? ?

Solution one:

In Cookie headers can be added to the normal output of HTML.


Error:

Here Insert Picture Description
'Gbk' can not handle '\ xa0'? ? ?

Solution one:
with open('%s.html' % title, 'w', encoding='utf-8') as f:
    f.write(rep)

problem:

Here Insert Picture Description
The output is a byte type, json object can not display properly? ? ?

Solution one:

Using the json.loadsmethod can be.


problem:

url = 'https://tieba.baidu.com/f?kw=%E8%8B%B1%E9%9B%84%E8%81%94%E7%9B%9F&ie=utf-8&pn=0'

Copy the URL to the py file, but become a "garbage"? ? ?

Solution one:

Call the urllib.parse.unquoteURL-decoding can be.


problem:

Here Insert Picture Description
URL address non-standard? ? ?

Solution one:

When analyzing URL, we generally start from the second page of analysis, rather than the first page.


problem:

Cookie do not want to carry their own account in the content? ? ?

Solution one:

Use your browser's incognito window functionality into the web page can then take Cookie.


Error:

Solution one:

Error:

Solution one:




To be continued Oh ~ ~ ~ ~



For my beloved girl ~


Guess you like

Origin www.cnblogs.com/WoLykos/p/12095277.html