Getting Started with Pandas Data Processing

Python’s Pandas library is a must-have for data scientists and analysts. In this article, we will explore in detail how to use Pandas for effective data processing, including understanding of data structure, data import, exploration and basic processing.

Meet Pandas

  • Briefly introduce the importance of Pandas
  • Install and import the Pandas library
import pandas as pd

Pandas data structure

  • Introduction to Series and DataFrame
  • Examples of creating Series and DataFrame
# Series
s = pd.Series([1, 3, 5, None, 6, 8])

# DataFrame
df = pd.DataFrame({
    
    'A': range(1, 5),
                   'B': pd.Timestamp('20230901'),
                   'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                   'D': pd.Categorical(["test", "train", "test", "train"]),
                   'E': 'foo'})

Part 3: Data Import

  • How to read CSV and Excel files
  • Sample code shows the data import process
# 读取CSV
df_csv = pd.read_csv('example.csv')

# 读取Excel
df_excel = pd.read_excel('example.xlsx')

Data exploration

  • View the basic information of the data (such as: shape, head, tail, describe, etc.)
  • Ways to select, filter and sort data
# 查看前几行
df.head()

# 描述性统计
df.describe()

# 列选择和过滤
df_filtered = df[df['A'] > 2]

# 排序
df_sorted = df.sort_values(by='B')

Data cleaning

  • Handle missing data
  • Modify column name
  • Data type conversion
# 处理缺失数据
df.fillna(value=5)

# 修改列名
df.rename(columns={
    
    'A': 'a'}, inplace=True)

# 数据类型转换
df['D'] = df['D'].astype('int32')

Data operations

  • Adding and deleting columns
  • Adding and deleting data rows
# 列的增加
df['F'] = df['A'] + df['D']

# 列的删除
df.drop('F', axis=1, inplace=True)

# 行的增加
df.append({
    
    'A': 5, 'B': pd.Timestamp('20231001'), 'C': 2.0, 'D': 3, 'E': 'bar'}, ignore_index=True)

# 行的删除
df.drop([0, 1], inplace=True)

in conclusion

Pandas is a powerful data processing tool, and mastering its basic operations is crucial for efficient data analysis. With this introduction, you should be able to start using Pandas to work with your data.

Guess you like

Origin blog.csdn.net/ken1583096683/article/details/134726384