[Python strange tricks] Use pandas' read_html function to implement web crawler with only one line of code

content

1. Introduce the read_html() function

2. Analyze and crawl the target page

Three, code explanation

4. Synchronized video explanation


1. Introduce the read_html() function

Friends who like Python programming, do you know that in addition to data analysis, python's pandas library can also be used as a simple crawler. With only one line of core code, you can implement a crawler program and easily crawl web page data!

It is the read_html() function of the pandas library , which is very convenient to implement python crawler.

It should be noted here that it can only crawl table data with <table></table> tags on the web page.

2. Analyze and crawl the target page

Here, the target URL I crawled is: Shanghai Weather Forecast_a website

As you can see, there is a table data on the page. Press F12 to open the developer mode and view the source code of the webpage:

It is indeed table data with <table> tags . That's easy, let's start coding!

Three, code explanation

There are 3 lines of code in total, and the core code is only 1 line:

import pandas as pd   # 导入库
url = 'http://weather.sina.com.cn/china/shanghaishi/'  # 目标网址(含有<table>的表格)
df = pd.read_html(url)[1]  # 开始爬取目标网站

In this short 3 lines of code, the data is crawled down. Take a look at the data that climbed down:

No problem, it is exactly the same as the original page data! Later, it is OK to save the data with pd.to_excel().

Super simple and powerful!

Here is a description of the official website parameters of the read_html() function for your reference: (I have translated it into Chinese ^_^)

Once again, it can only crawl table data with <table></table> tags on the web page.

If there is no <table> tag on the page, if this method is used to crawl, an error of " No tables found " will be prompted:

This is the picture I took with the ipython interface, and other IDEs will report the same error!

4. Synchronized video explanation

The code explains the video line by line:

[Crawler Artifact] 2-minute explanation to easily crawl web data with one line of python code

According to the past habits, I will share the Python source code files. This time, I don’t need to share it, just 3 lines of code, and let’s do it myself, my friend!

Synchronized public account articles:

[Python crawler tricks] Use the pandas library read_html function to get the crawler in one line of code!


I am Ma Ge, and I have tens of thousands of fans on the entire network. Welcome to exchange python technology together.

Search " Ma Ge python said " on various platforms: Zhihu, Bilibili, Xiaohongshu, Sina Weibo.

Guess you like

Origin blog.csdn.net/solo_msk/article/details/124225502