Missing data handler
Use ** df.isnull () ** determines missing data, returns true if deletion
import pandas as pd
import numpy as np
df=pd.read_csv('my_csv_date.csv',encoding='gb2312',\
na_values=['null','None'],\
dtype={'电话':str,})
print (df[['数据','重数','价格']])
print(np.sum(df[['数据','重数','价格']].isnull(),axis=1))
print(np.sum(df[['数据','重数','价格']].isnull(),axis=0))
print(df.apply(lambda x:sum(x.isnull())/len(x),axis=0))
Remove the row with missing values df.dropna (how = '', axis =,)
parameter:
axis: 0 indicates a delete whole line missing values; 1 represents an integer column deletion missing values
how: 'All' represents (row or column) are all no data, delete the entire only (row or column); 'any': As long as there is a deletion to delete the entire (row or column)
This function returns only delete a data set , but does not alter the original data,
inplace: Whether to replace the original data, but the data must be a class of objects , the following ** print (df [[ 'data', 'multiplicity', 'price']] dropna ().) Can not use this parameter because it It is sliced, and want to change the overall data, but df.dropna () ** You can use this parameter
import pandas as pd
import numpy as np
df=pd.read_csv('my_csv_date.csv',encoding='gb2312',\
na_values=['null','None'],\
dtype={'电话':str,})
print (df[['数据','重数','价格']])
print (df[['数据','重数','价格']].dropna())
Fill in missing data df [ '']. Fillna (value = {}, inplace = '', mthod = "")
Parameters:
== == value: use the dictionary assignment, may correspond to a plurality of columns filled with, e.g., value = {}
filled There are many ways, values can be written directly, as used herein, and filling value, a mode may also be used ( df [ 'nonce'] .fillna (df [ 'nonce'] .head (5).mode()[0], inplace = True)) filled modified
Use direct assignment
import pandas as pd
import numpy as np
df=pd.read_csv('my_csv_date.csv',encoding='gb2312', na_values=['null','None'], dtype={'电话':str,})
print(df['重量'].head(5))
df['重量'].fillna(df['重量'].head(5)sum(),inplace=True)
print(df['重量'].head(5))
Use a dictionary for value assignment
import pandas as pd
import numpy as np
df=pd.read_csv('my_csv_date.csv',encoding='gb2312',na_values=['null','None'],dtype={'电话':str,})
print(df[['重数','重量','价格']])
df.fillna(value={'重数':df['重数'].mode()[0],'重量':df['重量'].sum(),'价格':df['价格'].mode()[0]},inplace=True)
print(df[['重数','重量','价格']])
filling method parameters used fillna () method
Value method isffill, It indicates that the item before filling; isbfillWhen filling a rear entry
df=pd.read_csv('my_csv_date.csv',encoding='gb2312', na_values=['null','None'],dtype={'电话':str,})
print(df[['重数','重量','价格']])
df.fillna(method='ffill',inplace=True)
print(df[['重数','重量','价格']])