[Source Code] 10 Getting Started Examples of Python Crawler!

Insert image description here
Today, I will take my friends to learn Python crawler, and I have prepared a few simple introductory examples to share with you.

Main knowledge points involved:

  • How the web interacts
  • Application of get and post functions of requests library
  • Related functions and properties of response object
  • Open and save python files

Comments are given in the code and can be run directly.

How to install the requests library (Friends who have installed python can refer to it directly. If not, it is recommended to install a python environment first)

Windows users and Linux users are almost the same:

Open cmd and enter the following command. If the python environment is in the directory of the C drive, you will be prompted with insufficient permissions. You only need to run the cmd window in administrator mode.

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

Similar for Linux users (ubantu as an example):

If the permissions are not enough, just add sudo before the command.

sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

1. Crawl powerful BD pages and print page information

# 第一个爬虫示例,爬取百度页面
import requests #导入爬虫的库,不然调用不了爬虫的函数
response = requests.get("http://www.baidu.com")  #生成一个response对象
response.encoding = response.apparent_encoding #设置编码格式
print("状态码:"+ str( response.status_code ) ) #打印状态码
print(response.text)#输出爬取的信息

2. Examples of the get method of commonly used methods, and there are examples of parameter passing below.

# 第二个get方法实例
import requests #先导入爬虫的库,不然调用不了爬虫的函数
response = requests.get("http://httpbin.org/get")  #get方法
print( response.status_code ) #状态码
print( response.text )

3. Post method examples of commonly used methods, there are also parameter passing examples below

# 第三个 post方法实例
import requests #先导入爬虫的库,不然调用不了爬虫的函数
response = requests.post("http://httpbin.org/post")  #post方法访问
print( response.status_code ) #状态码
print( response.text )

4. put method instance

# 第四个 put方法实例
import requests #先导入爬虫的库,不然调用不了爬虫的函数
response = requests.put("http://httpbin.org/put")  # put方法访问
print( response.status_code ) #状态码
print( response.text )

5. Commonly used methods of getting method parameter passing examples (1)

If you need to pass multiple parameters, just use the & symbol to connect them as follows:

# 第五个 get传参方法实例
import requests #先导入爬虫的库,不然调用不了爬虫的函数
response = requests.get("http://httpbin.org/get?name=hezhi&age=20")  # get传参
print( response.status_code ) #状态码
print( response.text )

6. Commonly used methods of getting method parameter passing examples (2)

You can pass multiple params using a dictionary

# 第六个 get传参方法实例
import requests #先导入爬虫的库,不然调用不了爬虫的函数
data = {
    
    
  "name":"hezhi",
  "age":20
}
response = requests.get( "http://httpbin.org/get" , params=data )  # get传参
print( response.status_code ) #状态码
print( response.text )

7. Commonly used method post method parameter passing example (2) Is it similar to the previous one?

# 第七个 post传参方法实例
import requests #先导入爬虫的库,不然调用不了爬虫的函数
data = {
    
    
  "name":"hezhi",
  "age":20
}
response = requests.post( "http://httpbin.org/post" , params=data )  # post传参
print( response.status_code ) #状态码
print( response.text )

8. Regarding bypassing the anti-crawling mechanism, take zh dad as an example

# 第好几个方法实例
import requests #先导入爬虫的库,不然调用不了爬虫的函数
response = requests.get( "http://www.zhihu.com")  #第一次访问知乎,不设置头部信息
print( "第一次,不设头部信息,状态码:"+response.status_code )# 没写headers,不能正常爬取,状态码不是 200
#下面是可以正常爬取的区别,更改了User-Agent字段
headers = {
    
    
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
}#设置头部信息,伪装浏览器
response = requests.get( "http://www.zhihu.com" , headers=headers )  #get方法访问,传入headers参数,
print( response.status_code ) # 200!访问成功的状态码
print( response.text )

9. Crawl information and save it locally

Because of the directory relationship, a folder called crawler was created on the D drive and then the information was saved.

Pay attention to the encoding setting when saving the file

# 爬取一个html并保存
import requests
url = "http://www.baidu.com"
response = requests.get( url )
response.encoding = "utf-8" #设置接收编码格式
print("\nr的类型" + str( type(response) ) )
print("\n状态码是:" + str( response.status_code ) )
print("\n头部信息:" + str( response.headers ) )
print( "\n响应内容:" )
print( response.text )
#保存文件
file = open("D:\\爬虫\\baidu.html","w",encoding="utf")  #打开一个文件,w是文件不存在则新建一个文件,这里不用wb是因为不用保存成二进制
file.write( response.text )
file.close()

10. Crawl images and save them locally

#保存百度图片到本地
import requests #先导入爬虫的库,不然调用不了爬虫的函数
response = requests.get("https://www.baidu.com/img/baidu_jgylogo3.gif")  #get方法的到图片响应
file = open("D:\\爬虫\\baidu_logo.gif","wb") #打开一个文件,wb表示以二进制格式打开一个文件只用于写入
file.write(response.content) #写入文件
file.close()#关闭操作,运行完毕后去你的目录看一眼有没有保存成功

Okay, that’s it for today’s sharing.

If you are interested in Python, you can try this complete set of Python learning materials I compiled.

For beginners with 0 basics to get started:

If you are a novice and want to get started with Python quickly, you can consider it.
On the one hand, the learning time is relatively short and the learning content is more comprehensive and focused. The second aspect is that you can find a study plan that suits you

Including: Python permanent installation package, Python web development, Python crawler, Python data analysis, artificial intelligence, machine learning and other learning tutorials. Let you learn Python systematically from scratch!

Introduction to zero-based Python learning resources

1. Learning routes in all directions of Python

The Python all-direction route is to organize the commonly used technical points of Python to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the above knowledge points to ensure that you learn more comprehensively.
Insert image description here

2. Python learning software

If a worker wants to do his job well, he must first sharpen his tools. The commonly used development software for learning Python is here!
Insert image description here

3. Python introductory learning video

There are also many learning videos suitable for beginners. With these videos, you can easily get started with Python~Insert image description here

4. Python exercises

After each video lesson, there are corresponding exercises to test your learning results haha!
Insert image description here

5. Python practical cases

Optical theory is useless. You must learn to type code along with it and practice it in order to apply what you have learned to practice. At this time, you can learn from some practical cases. This information is also included~Insert image description here

6. Python interview materials

After we learn Python, we can go out and find a job if we have the skills! The following interview questions are all from first-tier Internet companies such as Alibaba, Tencent, Byte, etc., and Alibaba bosses have given authoritative answers. I believe everyone can find a satisfactory job after reviewing this set of interview materials.
Insert image description here
Insert image description here

7. Data collection

The complete set of Python learning materials mentioned above has been uploaded to CSDN official. Friends who need it can scan the CSDN official certification QR code below on WeChat and enter "receive materials" to get it for free! !

Guess you like

Origin blog.csdn.net/maiya_yaya/article/details/131780144