pandas学习之DataFrame

上一节学习了Series相关的概念，这部分的知识是学习pandas另一数据结构：DataFrame的基础。

DataFrame是二维的、有标记的数据结构，它可以具有可能不同类型的列。

可以将其看做类似SQL表格，或者包含多个Series对象的字典。

DataFrame可接受数据类型

DataFrame可以接受多种数据输入：

由一维ndarray构成的字典, 列表, 字典, 或者是Series
二维的ndarray
单独的Series
其它的DataFrame

由字典构成DataFrame

如果构成dataframe的元素没有对应的index，比如例子中的列名为two的index是从a->b，但是列名为one的index只有a,b,c，所以在形成的dataframe中，列one对应索引为d的值为NaN：

d = {
     'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])
    }

df = pd.DataFrame(d)

print(df)

->
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

当然，也可以通过索引来构建DataFrame，只选取特定index的值构成dataframe：

d = {
     'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])
    }

print(pd.DataFrame(d, index=['d', 'b', 'a']))

->

   one  two
d  NaN  4.0
b  2.0  2.0
a  1.0  1.0

通过list或者ndarray来构建dataframe

不指定index，默认从0开始，最大为list或者是array的长度：

d = {'one': [1., 2., 3., 4.],
     'two': [4., 3., 2., 1.]}
df = pd.DataFrame(d)
print(df)

->
   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

df = pd.DataFrame(d,index=['a','b','c','d'])
print(df)

->
   one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0

通过多个dict构建dataframe

这块的逻辑其实和通过ndarray构建dataframe的逻辑差不多：

data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
print(pd.DataFrame(data2))
print(pd.DataFrame(data2, index=['first', 'second']))

->

   a   b     c
0  1   2   NaN
1  5  10  20.0

        a   b     c
first   1   2   NaN
second  5  10  20.0

查看dataframe的列和行：

查看行枚举值：df.index

查看列枚举值：df.columns

不同的构造函数：

1.DataFrame.from_dict

2.DataFrame.from_records

通过以上两种构造函数，可以将dict和tuple列表以及结构化的ndarray构建成dataframe。

在语义上，DataFrame可以看为相似索引的Series对象的字典。获取，设置和删除列的语法与dict操作的语法类似。

DataFrame可以选取特定列，增加列，删除列等等操作......

#选取列
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df2=pd.DataFrame(data2)
print(df2)
print(df2['a'])

->
   a   b     c
0  1   2   NaN
1  5  10  20.0

0    1
1    5
Name: a, dtype: int64


#增加
df2['d']=df2['a']*df2['b']
print(df2)


->

   a   b     c   d
0  1   2   NaN   2
1  5  10  20.0  50

#删除

del df2['d']
df2.pop('c')
print(df2)

->
   a   b
0  1   2
1  5  10

当然，dataframe里面可以指定位置插入列：

df2.insert(1, 'bar', df2['a'])
print(df2)

->

   a  bar   b
0  1    1   2
1  5    5  10

pandas学习之DataFrame

猜你喜欢