Data analysis using pandas

Table of contents

A simple introduction to Pandas

Pandas is a powerful data analysis library that provides data structures and functions to make data cleaning, transformation, and analysis more convenient. Here is a simple example of data analysis using Pandas, assuming we have a CSV file containing student test scores.

First, import the Pandas library and read the data:

import pandas as pd

# 读取CSV文件
df = pd.read_csv('student_scores.csv')

Assume the structure of the CSV file is as follows:

Name, subject, score
Zhang San, mathematics, 85
Li Si, Chinese, 90
Wang 5. Mathematics, 78
...

Next, perform basic data analysis:

# 显示数据前几行
print(df.head())

# 查看数据信息
print(df.info())

# 描述性统计信息
print(df.describe())

These basic analyzes will show the structure and type of data as well as some statistical information. Next, more complex data analysis can be performed, such as calculating average scores by subject:

# 按科目计算平均分
avg_scores_by_subject = df.groupby('科目')['分数'].mean()
print(avg_scores_by_subject)

Finally, use Matplotlib to draw a chart, such as a bar chart of the average score of each subject:

import matplotlib.pyplot as plt

# 绘制条形图
avg_scores_by_subject.plot(kind='bar', color='skyblue')
plt.title('各科目平均分')
plt.xlabel('科目')
plt.ylabel('平均分数')
plt.show()

This example demonstrates how to use Pandas for data analysis, including data reading, basic information viewing, descriptive statistics and simple data visualization. In practical applications, more complex data processing and analysis operations can be performed according to specific needs.

Complete code example

Suppose there is a CSV file called `student_scores.csv` that contains students' names, subjects, and scores. Here is a complete example code demonstrating how to use Pandas for data analysis:

import pandas as pd
import matplotlib.pyplot as plt

# 读取CSV文件
df = pd.read_csv(r'D:\untitled13\9.2\.vscode\student_scores.csv')

# 显示数据前几行
print("数据前几行：")
print(df.head())

# 查看数据信息
print("\n数据信息：")
print(df.info())

# 描述性统计信息
print("\n描述性统计信息：")
print(df.describe())

# 按科目计算平均分
avg_scores_by_subject = df.groupby('科目')['分数'].mean()
print("\n按科目计算平均分：")
print(avg_scores_by_subject)

# 绘制各科目平均分的条形图
avg_scores_by_subject.plot(kind='bar', color='skyblue')
plt.title('各科目平均分')
plt.xlabel('科目')
plt.ylabel('平均分数')
plt.show()

In this example, we first read a CSV file containing student performance information, and then displayed the first few rows of the data, basic information about the data, and descriptive statistics. Next, the average score was calculated by subject using the `groupby` method, and a bar chart of the average score of each subject was drawn using Matplotlib.

Make sure you have a CSV file named `student_scores.csv` and adjust the file path and data field names accordingly. This example is just a simple introductory example. Actual applications may require more data cleaning, feature engineering and complex analysis operations.

File name: student_scores.csv

Name, subject, score
Zhang San, mathematics, 85
Li Si, Chinese, 90
Wang 5. Mathematics, 78

Replace the file path below with the absolute path of your student_scores.csv

df = pd.read_csv(r'D:\untitled13\9.2\.vscode\student_scores.csv')