Python crawler combat - data visualization

This article will introduce how to use Python crawlers to obtain data and visualize it, including the following main contents:

  1. Data acquisition: use the requests library to send HTTP requests to obtain the data of the target web page;
  2. Data parsing: Use the BeautifulSoup library to parse the HTML code and extract the required data;
  3. Data storage: use the pandas library to save the data to a local file;
  4. Data visualization: Use matplotlib and seaborn libraries to visualize data.

In the code example, we selected the Sina Finance website for crawling, obtained real-time stock data and visualized it.

1. Data Acquisition

Before using Python for data acquisition, the requests library needs to be installed, and we can use the pip command to install it.

pip install requests

The following is a code example for obtaining real-time stock data:

import requests

url = 'http://hq.sinajs.cn/list=sh000001'
response = requests.get(url)
data = response.text
print(data)

First, we define the URL address of the target webpage, and then use the requests library to send HTTP requests to obtain webpage data. The obtained response object contains information such as the status code, response header, and response body of the HTTP response. We use response.text to obtain the text data in the response body, that is, real-time stock data.

2. Data Analysis

After obtaining real-time stock data, the next step is to analyze the data and extract the required information. In Python, we can use the BeautifulSoup library to parse the HTML code, which provides a very convenient way to get the data in the web page.

Let's first take a look at the HTML code of real-time stock data on the Sina Finance website:

var hq_str_sh000001="上证指数,3283.92,20.27,0.62,675021,8887585";

We can see that the stock real-time data starts with var hq_str_sh000001=, ends with a semicolon, and is separated by commas in the middle. We can split a string into a list using the split() function.

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')
items = soup.text.split(',')
name = items[0].split('=')[1]
price = items[1]
change = items[2]
pchange = items[3]
volume = items[4]
amount = items[5][:-1]  # 去掉最后一个分号
print(name, price, change, pchange, volume, amount)

Using the BeautifulSoup library to parse the HTML code, we can easily get the value of each field. Here we use the split() function to split the text into a list, and then obtain the required data through the subscript of the list. Note that the last field amount contains a semicolon, we use slice[:-1] to remove the last semicolon.

3. Data storage

We save real-time stock data to local files for subsequent data visualization. In Python we can use the pandas library to save data as CSV files.

import pandas as pd

data = [[name, price, change, pchange, volume, amount]]
df = pd.DataFrame(data, columns=['name', 'price', 'change', 'pchange', 'volume', 'amount'])
df.to_csv('data.csv', index=False)

Use the pandas library to create a DataFrame object, then save the data as a CSV file. Note that the index (index) needs to be set to False when saving the CSV file, otherwise the index will also be saved to the file.

4. Data visualization

After saving the data to a local file, we can use the matplotlib and seaborn libraries to visualize the data. Here is a code example:

import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('data.csv')
fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(x='name', y='pchange', data=df, ax=ax)
ax.set_xlabel('股票名称')
ax.set_ylabel('涨跌幅')
ax.set_title('股票实时涨跌幅')

plt.show()

Use the pandas library to read the data in the CSV file, and then use the seaborn library to draw a bar chart. Here we use the stock name as the x-axis, and the rise and fall as the y-axis. By setting properties such as graph size, axis labels, and title, we can make the graph more beautiful.

Summarize

After generating the results, we can see that it is very convenient to use the Python crawler to obtain the data and visualize it. Through the graphical display of the data, we can observe the trend and change of the data more intuitively, so as to make better data analysis and decision-making.

 

Guess you like

Origin blog.csdn.net/wq10_12/article/details/132212759