content
1. Introduce the read_html() function
2. Analyze and crawl the target page
4. Synchronized video explanation
1. Introduce the read_html() function
Friends who like Python programming, do you know that in addition to data analysis, python's pandas library can also be used as a simple crawler. With only one line of core code, you can implement a crawler program and easily crawl web page data!
It is the read_html() function of the pandas library , which is very convenient to implement python crawler.
It should be noted here that it can only crawl table data with <table></table> tags on the web page.
2. Analyze and crawl the target page
Here, the target URL I crawled is: Shanghai Weather Forecast_a website
As you can see, there is a table data on the page. Press F12 to open the developer mode and view the source code of the webpage:
It is indeed table data with <table> tags . That's easy, let's start coding!
Three, code explanation
There are 3 lines of code in total, and the core code is only 1 line:
import pandas as pd # 导入库
url = 'http://weather.sina.com.cn/china/shanghaishi/' # 目标网址(含有<table>的表格)
df = pd.read_html(url)[1] # 开始爬取目标网站
In this short 3 lines of code, the data is crawled down. Take a look at the data that climbed down:
No problem, it is exactly the same as the original page data! Later, it is OK to save the data with pd.to_excel().
Super simple and powerful!
Here is a description of the official website parameters of the read_html() function for your reference: (I have translated it into Chinese ^_^)
Once again, it can only crawl table data with <table></table> tags on the web page.
If there is no <table> tag on the page, if this method is used to crawl, an error of " No tables found " will be prompted:
This is the picture I took with the ipython interface, and other IDEs will report the same error!
4. Synchronized video explanation
The code explains the video line by line:
[Crawler Artifact] 2-minute explanation to easily crawl web data with one line of python code
According to the past habits, I will share the Python source code files. This time, I don’t need to share it, just 3 lines of code, and let’s do it myself, my friend!
Synchronized public account articles:
I am Ma Ge, and I have tens of thousands of fans on the entire network. Welcome to exchange python technology together.
Search " Ma Ge python said " on various platforms: Zhihu, Bilibili, Xiaohongshu, Sina Weibo.