【Análisis y visualización de datos】 Hablar sobre NaN

NaN significa que no es un número

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
# 创建NaN
n = np.nan
# 类型
type(n)
float
# 任何数字和nan做计算永远是nan
m = 1
m + n
nan

NaN en serie

# 创建含nan情况
s1 = Series([1,2,np.nan,3,4],index=['A','B','C','D','E'])
s1
A    1.0
B    2.0
C    NaN
D    3.0
E    4.0
dtype: float64
# 判断是否nan
s1.isnull()
A    False
B    False
C     True
D    False
E    False
dtype: bool
s1.notnull()
A     True
B     True
C    False
D     True
E     True
dtype: bool
# nan删除掉nan
s1.dropna()
A    1.0
B    2.0
D    3.0
E    4.0
dtype: float64

NaN en DataFrame

# 创建含有nan情况
df1 = DataFrame(np.random.rand(25).reshape(5,5))
df1.ix[2,4] = np.nan
df1.ix[1,3] = np.nan
df1
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.
0 0 1 2 3 4 4
0 0 0.912220 0.932765 0.827517 0,031858 0.749619
1 0.957043 0.857664 0.616395 NaN 0.562609
2 0.686575 0,016802 0,030477 0.609545 NaN
3 0.543484 0,555226 0.138279 0.979043 0.460136
4 4 0.870316 0.141909 0.567168 0.116696 0.204007
# 判断nan
df1.isnull()
0 0 1 2 3 4 4
0 0 Falso Falso Falso Falso Falso
1 Falso Falso Falso Cierto Falso
2 Falso Falso Falso Falso Cierto
3 Falso Falso Falso Falso Falso
4 4 Falso Falso Falso Falso Falso
df1.notnull()
0 0 1 2 3 4 4
0 0 Cierto Cierto Cierto Cierto Cierto
1 Cierto Cierto Cierto Falso Cierto
2 Cierto Cierto Cierto Cierto Falso
3 Cierto Cierto Cierto Cierto Cierto
4 4 Cierto Cierto Cierto Cierto Cierto
# 删除的使用(df二维的,因此略有不同)
# axis=0所有带nan的行全部删除
df2 = df1.dropna(axis=0)
df2
0 0 1 2 3 4 4
0 0 0.912220 0.932765 0.827517 0,031858 0.749619
3 0.543484 0,555226 0.138279 0.979043 0.460136
4 4 0.870316 0.141909 0.567168 0.116696 0.204007
# axis=1所有带nan的列全部删除
df2 = df1.dropna(axis=1)
df2
0 0 1 2
0 0 0.912220 0.932765 0.827517
1 0.957043 0.857664 0.616395
2 0.686575 0,016802 0,030477
3 0.543484 0,555226 0.138279
4 4 0.870316 0.141909 0.567168
# 如何删除now,参数now
# any 只要有一个为nan就删掉 当前行或列
df2 = df1.dropna(axis=0,how='any')
df2
0 0 1 2 3 4 4
0 0 0.912220 0.932765 0.827517 0,031858 0.749619
3 0.543484 0,555226 0.138279 0.979043 0.460136
4 4 0.870316 0.141909 0.567168 0.116696 0.204007
# 如何删除now,参数now
# all 只有全部为nan就删掉 当前行或列
df2 = df1.dropna(axis=0,how='all')
df2
0 0 1 2 3 4 4
0 0 0.912220 0.932765 0.827517 0,031858 0.749619
1 0.957043 0.857664 0.616395 NaN 0.562609
2 0.686575 0,016802 0,030477 0.609545 NaN
3 0.543484 0,555226 0.138279 0.979043 0.460136
4 4 0.870316 0.141909 0.567168 0.116696 0.204007
# 为测试thresh参数新建数据
df2 = DataFrame(np.random.rand(25).reshape(5,5))
df2.ix[2,:] = np.nan
df2.ix[1,3] = np.nan
df2.ix[3,3] = np.nan
df2.ix[3,4] = np.nan
df2
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  """
/Users/bennyrhys/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:6: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
0 0 1 2 3 4 4
0 0 0.371901 0.140453 0.576335 0.895684 0.233522
1 0.896337 0.719907 0.647172 NaN 0.698708
2 NaN NaN NaN NaN NaN
3 0.415230 0.601340 0.694270 NaN NaN
4 4 0.926047 0.913255 0.586473 0.442759 0.238776
# thresh参数是一个删除界限(当前行或列的nan>2,则删除)
df3 = df2.dropna(thresh=2)
df3
0 0 1 2 3 4 4
0 0 0.371901 0.140453 0.576335 0.895684 0.233522
1 0.896337 0.719907 0.647172 NaN 0.698708
3 0.415230 0.601340 0.694270 NaN NaN
4 4 0.926047 0.913255 0.586473 0.442759 0.238776
# nan填充值(可以具体指定行列nan填充值)
df2.fillna(value=1)
0 0 1 2 3 4 4
0 0 0.371901 0.140453 0.576335 0.895684 0.233522
1 0.896337 0.719907 0.647172 1.000000 0.698708
2 1.000000 1.000000 1.000000 1.000000 1.000000
3 0.415230 0.601340 0.694270 1.000000 1.000000
4 4 0.926047 0.913255 0.586473 0.442759 0.238776
# 可以具体指定行列nan填充值)
df2.fillna(value={0:0,1:1,2:2,3:3,4:4})
0 0 1 2 3 4 4
0 0 0.371901 0.140453 0.576335 0.895684 0.233522
1 0.896337 0.719907 0.647172 3.000000 0.698708
2 0.000000 1.000000 2.000000 3.000000 4.000000
3 0.415230 0.601340 0.694270 3.000000 4.000000
4 4 0.926047 0.913255 0.586473 0.442759 0.238776

Los valores originales de fillna y dropna no cambiarán, debe guardar los nuevos valores

234 artículos originales publicados · Me gusta 164 · Visitas 140,000+

Supongo que te gusta

Origin blog.csdn.net/weixin_43469680/article/details/105600712
Recomendado
Clasificación