6.5学习笔记(缺失值)

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,3),index=['a','c','e','f','h'],columns=['one','two','three'])
df= df.reindex(['a','b','c','d','e','f','g','h'])
print(df['two'].isnull())

结果:
a False
b True
c False
d True
e False
f False
g True
h False
Name: two, dtype: bool

notnull():判断不是空值

处理缺失值时会将NA视为0

  print(df['one'].sum())

处理缺失值
i.清理/填充缺少数据
ii.Pandas提供各种方法来清除缺失值
iii.fillna()函数可以通过几种方法用非空数据“填充“NA值
iv.用标量值替换NaN
用0替换NaN

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3,3),index=['a','c','e'],columns=['one','two','three'])
df= df.reindex(['a','b','c'])
print(df.fillna(0))

结果:
one two three
a -1.282845 -0.455397 1.043505
b 0.000000 0.000000 0.000000
c -0.228125 0.679129 0.809223

填写NA前进和后退,使用重构索引的填充概念,来填补缺失值
填充方法向前(pad/fill)

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,3),index=['a','c','e','f','h'],columns=['one','two','three'])
df= df.reindex(['a','b','c','d','e','f','g'])
print(df.fillna(method='pad'))

结果:
one two three
a 1.626482 -0.054672 2.649389
b 1.626482 -0.054672 2.649389
c 2.185441 0.499460 -0.219088
d 2.185441 0.499460 -0.219088
e 1.667473 0.750116 -0.927406
f 0.193735 -0.968799 -0.420697
g 0.776946 -0.644994 0.139583

向后填充

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,3),index=['a','c','e','f','h'],columns=['one','two','three'])
df= df.reindex(['a','b','c','d','e','f','g','h'])
print(df.fillna(method='bfill'))

结果:
one two three
a 1.144226 -0.278943 0.711329
b -0.210107 -1.473591 -1.013294
c -0.210107 -1.473591 -1.013294
d 1.415249 -0.355017 -1.420598
e 1.415249 -0.355017 -1.420598
f -0.689457 -1.626022 0.900874
g -0.808437 -1.290974 -0.786823
h -0.808437 -1.290974 -0.786823

丢失缺失值
如果只想排除缺少的值,则使用dropna函数和axis参数
丢失整行缺失值

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,3),index=['a','c','e','f','g'],columns=['one','two','three'])
df= df.reindex(['a','b','c','d','e','f','g','h'])
print(df.dropna())

结果:
one two three
a 0.895142 1.652897 -0.329553
c -1.846283 -1.107428 0.610238
e 0.309367 -0.485988 1.061889
f 0.488847 -0.043163 0.705120
g -0.001079 0.865002 0.925668

8 49:08

发布了5 篇原创文章 · 获赞 2 · 访问量 2672

猜你喜欢

转载自blog.csdn.net/weixin_43621813/article/details/90906551