Cleaning of missing data pandas--

Missing data handler

Use ** df.isnull () ** determines missing data, returns true if deletion

import pandas as pd
import numpy as np 

df=pd.read_csv('my_csv_date.csv',encoding='gb2312',\
	na_values=['null','None'],\
	dtype={'电话':str,})
print (df[['数据','重数','价格']])
print(np.sum(df[['数据','重数','价格']].isnull(),axis=1))
print(np.sum(df[['数据','重数','价格']].isnull(),axis=0))
print(df.apply(lambda x:sum(x.isnull())/len(x),axis=0))

Here Insert Picture Description

Remove the row with missing values ​​df.dropna (how = '', axis =,)

parameter:
axis: 0 indicates a delete whole line missing values; 1 represents an integer column deletion missing values
how: 'All' represents (row or column) are all no data, delete the entire only (row or column); 'any': As long as there is a deletion to delete the entire (row or column)
This function returns only delete a data set , but does not alter the original data,
inplace: Whether to replace the original data, but the data must be a class of objects , the following ** print (df [[ 'data', 'multiplicity', 'price']] dropna ().) Can not use this parameter because it It is sliced, and want to change the overall data, but df.dropna () ** You can use this parameter

import pandas as pd
import numpy as np 

df=pd.read_csv('my_csv_date.csv',encoding='gb2312',\
	na_values=['null','None'],\
	dtype={'电话':str,})
print (df[['数据','重数','价格']])
print (df[['数据','重数','价格']].dropna())

Here Insert Picture Description

Fill in missing data df [ '']. Fillna (value = {}, inplace = '', mthod = "")

Parameters:
== == value: use the dictionary assignment, may correspond to a plurality of columns filled with, e.g., value = {}
filled There are many ways, values can be written directly, as used herein, and filling value, a mode may also be used ( df [ 'nonce'] .fillna (df [ 'nonce'] .head (5).mode()[0], inplace = True)) filled modified

Use direct assignment

import pandas as pd
import numpy as np 

df=pd.read_csv('my_csv_date.csv',encoding='gb2312',	na_values=['null','None'],	dtype={'电话':str,})
print(df['重量'].head(5))
df['重量'].fillna(df['重量'].head(5)sum(),inplace=True)
print(df['重量'].head(5))

Here Insert Picture Description

Use a dictionary for value assignment

import pandas as pd
import numpy as np 

df=pd.read_csv('my_csv_date.csv',encoding='gb2312',na_values=['null','None'],dtype={'电话':str,})
print(df[['重数','重量','价格']])
df.fillna(value={'重数':df['重数'].mode()[0],'重量':df['重量'].sum(),'价格':df['价格'].mode()[0]},inplace=True)
print(df[['重数','重量','价格']])

Here Insert Picture Description

filling method parameters used fillna () method

Value method isffill, It indicates that the item before filling; isbfillWhen filling a rear entry

df=pd.read_csv('my_csv_date.csv',encoding='gb2312',	na_values=['null','None'],dtype={'电话':str,})
print(df[['重数','重量','价格']])
df.fillna(method='ffill',inplace=True)
print(df[['重数','重量','价格']])

Here Insert Picture Description

Published 70 original articles · won praise 1 · views 2421

Guess you like

Origin blog.csdn.net/weixin_43794311/article/details/104970207