11.dealing-with-nan

处理缺失值

  • df_name.isnull().sum().sum()
    • isnull()返回一个True Or False 的二维数据框
    • 第一个sum统计每一列中NaN有多少个,并返回一个
    • 第二个统计所有的NaN

  • 同样可以计数非空值
# We print the number of non-NaN values in our DataFrame
print()
print('Number of non-NaN values in the columns of our DataFrame:')
print(store_items.count())
Number of non-NaN values in the columns of our DataFrame:
bikes            3
glasses        2
pants           3
shirts           2
shoes          3
suits            2
watches      3
dtype: int64
  • 处理缺失值
    • dropna(axis)
      • 0 删除行
      • 1 删除列
# We drop any rows with NaN values
store_items.dropna(axis = 0)
 	** bikes** 	glasses 	pants 	shirts 	shoes 	suits 	watches
store 2 	15 	50.0 	5 	2.0 	5 	7.0 	10
# We drop any columns with NaN values
store_items.dropna(axis = 1)
    • fillna()
      • 以前值填充NaN
        • df_name.fillna(method = ‘ffill’,axis)
# We replace NaN values with the next value in the row
store_items.fillna(method = 'backfill', axis = 1)
 	** bikes** 	glasses 	pants 	shirts 	shoes 	suits 	watches
store 1 	20.0 	30.0 	30.0 	15.0 	8.0 	45.0 	35.0
store 2 	15.0 	50.0 	5.0 	2.0 	5.0 	7.0 	10.0
store 3 	20.0 	4.0 	30.0 	10.0 	10.0 	35.0 	35.0
    • df_name.interpolate(method = ‘linear’, axis)
      • 前后两点加现在的点做直线,等距划分取点
# We replace NaN values by using linear interpolation using row values
store_items.interpolate(method = 'linear', axis = 1)
 	** bikes** 	glasses 	pants 	shirts 	shoes 	suits 	watches
store 1 	20.0 	25.0 	30.0 	15.0 	8.0 	45.0 	35.0
store 2 	15.0 	50.0 	5.0 	2.0 	5.0 	7.0 	10.0
store 3 	20.0 	4.0 	30.0 	20.0 	10.0 	22.5 	35.0
import pandas as pd
import numpy as np

pd.set_option('precision', 1)
books = pd.Series(data=[
    'Great Expectations', 'Of Mice and Men', 'Romeo and Juliet',
    'The Time Machine', 'Alice in Wonderland'
])
authors = pd.Series(data=[
    'Charles Dickens', 'John Steinbeck', 'William Shakespeare', ' H. G. Wells',
    'Lewis Carroll'
])
user_1 = pd.Series(data=[3.2, np.nan, 2.5])
user_2 = pd.Series(data=[5., 1.3, 4.0, 3.8])
user_3 = pd.Series(data=[2.0, 2.3, np.nan, 4])
user_4 = pd.Series(data=[4, 3.5, 4, 5, 4.2])

dat = {
    'Book Title': books,
    'Author': authors,
    'User 1': user_1,
    'User 2': user_2,
    'User 3': user_3,
    'User 4': user_4
}

book_ratings = pd.DataFrame(dat)
book_ratings.fillna(book_ratings.mean(),inplace=True)
book_ratings
	Book Title	Author	User 1	User 2	User 3	User 4
0	Great Expectations	Charles Dickens	3.2	5.0	2.0	4.0
1	Of Mice and Men	John Steinbeck	2.9	1.3	2.3	3.5
2	Romeo and Juliet	William Shakespeare	2.5	4.0	2.8	4.0
3	The Time Machine	H. G. Wells	2.9	3.8	4.0	5.0
4	Alice in Wonderland	Lewis Carroll	2.9	3.5	2.8	4.2

猜你喜欢

转载自blog.csdn.net/a245293206/article/details/89956269
NaN