The request library requests website analysis operation steps (hands-on practice\reproducible fool operation)

The target analysis website is as follows:

Shaanxi Provincial Government Procurement Network

The website looks like this

what we want is

Details announcement in the bottom right section

 After opening this interface

right click

Select "Inspect"

No matter which browser is used, select "Check" at this step

It is recommended to use Google Chrome here

(ready to insert a link to install Google Chrome)

 Then get the following interface

Then select the network label on the red circle

can get

 We can see that the part circled in yellow is empty, so click on the part circled in red to refresh the website

Something appears in the part of the yellow circle

After clicking on this website, there is only one thing in this part, click to get

If many things appear

click on this link below

(A link is going to be inserted here)

Mouse click on something new

found out this

Check out the url for this thing

之前的网址
http://113.200.80.230/notice/list.do?noticetype=3&index=3&province=province
这个东西的网址
http://113.200.80.230/notice/noticeaframe.do?noticetype=3&isgovertment=

Surprisingly, not only the "shape" of the web page has changed

URL also changed

This is called asynchronous loading

If you find that the URL has not changed through the above operations

Then that's called synchronous loading

(Insert an explanation link for asynchronous loading and synchronous loading here)

Then we will still be surprised to find

Even if the URL changes, the "shape" of the page changes

But the information we need to crawl

still on this page

so

When the real URL is found

You can use the request request

The real URL is the URL we just pulled out

http://113.200.80.230/notice/noticeaframe.do?noticetype=3&isgovertment=

The request request requires the following two libraries

import requests # python基础爬虫库
from lxml import etree # 可以将网页转换为

Copy these two lines of code and run it

If run error

Then there is a high probability that you do not have this library installed.

need to run

these two lines of code

pip install requests 
pip install lxml  

then wait patiently

About five minutes later (or less, depending on internet speed and luck, and the mood of the computer)

Then run these two lines of code

import requests # python基础爬虫库
from lxml import etree # 可以将网页转换为

basically succeeded

if still error

It is recommended to knock one directly on your computer.

Or click the two links below to view the details

The request library installation error is reported. Click this

Only by installing the requests library correctly can no error be reported [windows10/11 environment]

lxml library installation error click here

Python installation lxml library error_python installation lxml error solution_Xu Xiaoqing's blog-CSDN blog

after installation

before the request

We need some camouflage on our request behavior

to confuse this site

General confusing website ideas

available from

header cookie refer

Three angles to consider

The most basic of these is the header

(The role and meaning of header)

Generally, ordinary websites only need the header to solve the problem

So is this site

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36'}

Write this line of code to solve it

Find your local User-Agent like this

Reptile direction: the function and meaning of header and how to find it icon-default.png?t=M666http://t.csdn.cn/8DRLp

If you need to use a lot of headers, you can solve it by building fake headers

(insert a link)

Of course, it is also possible to directly use the header provided by this code

url2 ="#这里粘贴真网址"
response2 = requests.get(url= url2,headers=headers) 
response2 .encoding = 'utf-8'
wb_data_2 = response2.text
html = etree.HTML(wb_data_2)

Then

enter

html

If you can print something, it proves that the request is successful

Guess you like

Origin blog.csdn.net/weixin_48572116/article/details/126370685
Recommended