An example of a dictionary tree implementation method for Python data structures and algorithms
Sometimes, our well-written crawler code runs OK before, and suddenly an error is reported.
The error message is as follows:
Http 800 Internal internet error
This is because your target website has an anti-crawler program. If you use the existing crawler code, it will be rejected.
The normal crawler code before is as follows:
from urllib.request import urlopen
...
html = urlopen(scrapeUrl)
bsObj = BeautifulSoup(html.read(), "html.parser")
At this time, we need to do the following for our crawler code Disguise,
add a header to it pretending to be a request from a browser The
modified code is as follows:
import urllib.parse
import urllib.request
from bs4 import BeautifulSoup
...
req = urllib.request.Request(scrapeUrl)
req.add_header( 'User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)')
response = urllib.request.urlopen(req)
html = response.read()
bsObj = BeautifulSoup(html, "html.parser"
Ok, everything is done, you can continue to climb again.
The above is the whole content of this article, I hope it will be helpful for your study