python crawler.2. disguised browser - Code World

python crawler.2. disguised browser

Others 2022-04-21 22:05:33 views: 0

Some webpages will return with an error when crawling

urllib.error.HTTPError: HTTP Error 403: Forbidden

This is the URL that is detecting the connection object, so you need to disguise the browser and set the User Agent

Open the webpage in the browser ---> F12 ---> Network ---> Refresh

Then select an item to see User-Agent in the header

User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36

import urllib.request                   #url包

def openUrl(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
        'Host': 'jandan.net'
    }
    req = urllib.request.Request(url, headers=headers)
    response = urllib.request.urlopen(req)      #请求
    html = response.read() #Get
    html = html.decode("utf-8") #decode
    print(html) #print
    
if __name__ == "__main__":
    url = "http://jandan.net/ooxx/" #'http://www.douban.com/'
    openUrl(url)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324620412&siteId=291194637

python crawler.2. disguised browser

Python crawler selenium closes and switches browser tabs

docker+python headless browser crawler

[python crawler] how to get the browser header

[Crawler] Urllib lets our python pretend to be a browser

The compressed image format files disguised files and convert files to python exe file (test done, really effective)

python web crawler study notes Four Selenium browser and operating elements

Both methods python crawler analog browser instance Analysis

Getting started with Python crawler 5: Simulate the browser to visit the website

The road of python crawler - first knowledge and simple example of headless browser

[python crawler] using selenium and Chrome browser for automated web search and browsing

Python crawler essentials: Use of browser developer tools, very detailed

[python crawler] 10. Command the browser to work automatically (selenium)

CryptBot spreads disguised as cracked software

Crawler ----- various browser proxies

Crawler ----- various browser proxies

Python browser simulation of selenium

Browser operation in python selenium

[python] Install browser driver

Python crawler entry 6: http message body compression transmission that simulates browser access to web pages

Getting started with Python crawler 3: Use Google browser to get http information of website visits

Dynamic web scraping of Python development crawler: Crawling blog comment data - simulating browser crawling through Selenium

First contact with the UA (browser identification information) problem involved in the python crawler requests.get

Python crawler - use of urllib library (get/post request + simulated timeout/browser)

Python web crawler 2

Python crawler learning (2)

Python crawler summary 2

Python crawler learning (2)

Python crawler summary 2

python crawler practice (2)

Recommended

Rushing to the GitHub hot list——How can open source programming languages and frameworks be so cute?

Beijing Humanoid Robot Innovation Center launches Tiangong, the world's first full-size humanoid robot with purely electric drive for anthropomorphic running

Ranking

8个无需编写代码即可使用 Python 内置库的方法

Java collections interview knowledge Lite

Machine learning algorithms foundation - Introduction a (watermelon materials for the book)

(Easy) Ransom Note - LeetCode

[Five days] Qt from entry to actual combat: the second day

Remember once extremely pit father can not download Maven Jar package of issues: IDEA question

The minimum cost Shortest

OSPF study map (the most complete version)

Network Takeaways

GnuPG

Daily

More

2024-04-27(29)

2024-04-26(22)

2024-04-25(32)

2024-04-24(30)

2024-04-23(30)

2024-04-22(5)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)