[python crawler] requests module

overview

requests can simulate a browser to initiate a network request of the HTTP or HTTPS protocol to obtain the source code of the web page

The main methods of initiating network requests are get() and post() in requests. The function of get() is to initiate a request to obtain a web page, and post() is to transmit data to the server and is often used to simulate user login.

1. Obtain the source code of the static web page

Open the Baidu webpage and print the source code of the webpage

import requests as re

rp = re.get(url='https://www.baidu.com')
print(rp.text)

operation result

 2. Get dynamically loaded data

A dynamic webpage is a webpage template returned by the server. In the template filled with data through Ajax or other methods, the required data is generally in the JSON format data package returned by the server.

The distinction between dynamic and static: If the web page will load more data as the browser scrolls down, then this is dynamic

 

import requests as re

header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}

url='https://movie.douban.com/j/chart/top_list'
params={'type':'25','interval_id':'100:900','action':'','start':'0','limit':'1'}
rp = re.get(url=url,headers=header,params=params)

r=rp.json()
print(r)

 

 3. Get pictures

When obtaining the source code, first use get() to obtain the response object, and then use the text property of the response object to extract the source code of the web page. But if you want to get a picture, you also use get() to get the response object first, but you can't use the text attribute to extract the binary bytecode of the image. You should use the content attribute to extract the image.

import requests
url = ''

response = requests.get(url = url)
content = response.content

with open('图片.jpg', 'wb') as fp:
    fp.write(content)

Guess you like

Origin blog.csdn.net/weixin_39407597/article/details/126572283