DataFrame的创建、基本属性、基本操作

DataFrame

它是Pandas中的一个表格型的数据结构,包含有一组有序的列,每列可以是不同的值类型(数值、字符串、布尔型等),DataFrame即有行索引也有列索引,可以被看做是由Series组成的字典。

Series

它是一种类似于一维数组的对象,是由一组数据(各种NumPy数据类型)以及一组与之相关的数据标签(即索引)组成。仅由一组数据也可产生简单的Series对象。

import pandas as pd
import numpy as np

DataFrame的创建

  1. 根据字典创建,每一个键值对看作一个Series
data = {
    'name':['A','B','C','D','D'],
    'year':[2018,None,2020,2021,2022],
    'price':[0,1,2,3,4]
}
df = pd.DataFrame(data)
df
name year price
0 A 2018.0 0
1 B NaN 1
2 C 2020.0 2
3 D 2021.0 3
4 D 2022.0 4
# 指定索引的值
df = pd.DataFrame(data, index=['one','two','three','four','five'])
df
name year price
one A 2018.0 0
two B NaN 1
three C 2020.0 2
four D 2021.0 3
five D 2022.0 4
  1. 根据numpy的ndarray创建
df_1 = pd.DataFrame(np.array([['A',2018,0],['B',2019,1],['C',2020,2],['D',2021,3],['E',2022,4]]),columns=['name','year','price'])
df_1
name year price
0 A 2018 0
1 B 2019 1
2 C 2020 2
3 D 2021 3
4 E 2022 4

DataFrame的基本属性

  1. df.index:返回df的索引,即行标签
df.index
Index(['one', 'two', 'three', 'four', 'five'], dtype='object')
for i in range(len(df.index)):
    print(df.index[i])
one
two
three
four
five
  1. df.columns:返回df的列名,即列标签
df.columns
Index(['name', 'year', 'price'], dtype='object')
for i in range(len(df.columns)):
    print(df.columns[i])
name
year
price
  1. df.dtypes:返回df每一列的数据类型
df.dtypes
name      object
year     float64
price      int64
dtype: object
  1. df.values:以numpy的形式返回df中的值
df.values
array([['A', 2018.0, 0],
       ['B', nan, 1],
       ['C', 2020.0, 2],
       ['D', 2021.0, 3],
       ['D', 2022.0, 4]], dtype=object)

DataFrame的操作方法

  1. df.astype: 转换指定数据类型
df.astype({'price': 'int32'}).dtypes
name      object
year     float64
price      int32
dtype: object
  1. df.convert_dtypes:自动转换最佳数据类型(pandas==1.0.0以上)
df.convert_dtypes().dtypes
name     string
year      Int64
price     Int64
dtype: object
  1. df.isna/df.notna: 检测缺失值和未缺失值
df.isna()
name year price
one False False False
two False True False
three False False False
four False False False
five False False False
df.notna()
name year price
one True True True
two True False True
three True True True
four True True True
five True True True
  1. df.head: 获取表格的前几行
df.head(3)
name year price
one A 2018.0 0
two B NaN 1
three C 2020.0 2
  1. df.at: 根据行/列的名称获取表格中对应的单个值
df.at['two','name']
'B'
  1. df.iat: 根据行/列的序号获取表格中对应的单个值
df.iat[1,1]
nan
# 修改赋值
df.iat[1,2]=None
df
name year price
one A 2018.0 0
two B NaN 1
three C 2020.0 2
four D 2021.0 3
five D 2022.0 4
  1. df.loc:通过标签或布尔数组访问一组行和列。功能太多,可访问官方文档
df.loc['one']
name        A
year     2018
price       0
Name: one, dtype: object
df.loc[['one','four']]
name year price
one A 2018.0 0
four D 2021.0 3
df.loc['one','name']
'A'
df.loc['one':'three','name']
one      A
two      B
three    C
Name: name, dtype: object
# 通过对行标记bool类型来显示所需要的行
df.loc[[True,True,True,False,True]]
name year price
one A 2018.0 0
two B NaN 1
three C 2020.0 2
five D 2022.0 4
  1. df.iloc: 按照位置索引来选取数据
df.iloc[2]
name        C
year     2020
price       2
Name: three, dtype: object
df.iloc[2:4]
name year price
three C 2020.0 2
four D 2021.0 3
df.iloc[[2,4]]
name year price
three C 2020.0 2
five D 2022.0 4
df.iloc[[2,4],[2]]
price
three 2
five 4
  1. df.isin: DataFrame中是否包含这个元素
df.isin([2,3])
name year price
one False False False
two False False False
three False False True
four False False True
five False False False
  1. df.groupby: 对DataFrame进行分组
df.groupby(['name'])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001F2168ACC48>
df.groupby(['name']).mean()
year price
name
A 2018.0 0.0
B NaN 1.0
C 2020.0 2.0
D 2021.5 3.5
df.groupby(['name']).sum()
year price
name
A 2018.0 0
B 0.0 1
C 2020.0 2
D 4043.0 7

11.== df.drop==: 从行或列删除指定的标签

# 删除列
df.drop(['name'],axis=1)
year price
one 2018.0 0
two NaN 1
three 2020.0 2
four 2021.0 3
five 2022.0 4
df.drop(columns=['name'])
year price
one 2018.0 0
two NaN 1
three 2020.0 2
four 2021.0 3
five 2022.0 4
# 通过标签索引名称删除行
df.drop(['one'])
name year price
two B NaN 1
three C 2020.0 2
four D 2021.0 3
five D 2022.0 4
  1. 根据表格画柱状图
# 画在一幅图里
ax = df.plot.bar(rot=0)

在这里插入图片描述

# 画在多幅图中
axes = df.plot.bar(rot=0, subplots=True)
axes[1].legend(loc=2)

在这里插入图片描述

发布了45 篇原创文章 · 获赞 6 · 访问量 6917

猜你喜欢

转载自blog.csdn.net/Smile_mingm/article/details/105734653