Exception handling of data cleaning pandas--

Exception handler

A first data calculated upper and lower limits, it is determined whether the data is within this range, and other operations can be replaced

Common computing functions:

Quantile:. Df height .quantile (0.5) # half quantile, which is the median
median:. Df height .median ()
averages:. Df height .mean ()
standard deviation:. Df Height .std ()
function describing:. df height .describe ()

Determining whether there is an abnormal value any ()

import pandas as pd
import numpy as np 

df=pd.read_csv('test_innom.csv',encoding='gbk')
print(df.身高)
df_mean=df.身高.mean()
df_std=df.身高.std()
min_da=df_mean-df_std
max_da=df_mean+df_std
print(any((df.身高< min_da )| (df.身高 > max_da) ))
for x in df.身高:
	if (x< min_da )| (x > max_da) :
		print("异常值%d"%x)
print(df_mean)
print(df_std)
print(df.身高.describe())
print(df.身高.median())
print(df.身高.quantile(0.5))

Here Insert Picture Description

Alternatively using the maximum range values

Use loc functions need to modify the position data, the data in the assigned replacement, the following example will use the data beyond the maximum value does not exceed the maximum data substitution

import pandas as pd
import numpy as np 

df=pd.read_csv('test_innom.csv',encoding='gbk')
print(df.身高)
df_mean=df.身高.mean()
df_std=df.身高.std()
min_da=df_mean-df_std
max_da=df_mean+df_std
print(any((df.身高< min_da )| (df.身高 > max_da) ))
for x in df.身高:
	if (x< min_da )| (x > max_da) :
		print("异常值%d"%x)
rep_val_max=df.身高[df.身高<max_da].max()
df.loc[df.身高>max_da,'身高']=rep_val_max
print(df.身高)	

Here Insert Picture Description

Published 70 original articles · won praise 1 · views 2420

Guess you like

Origin blog.csdn.net/weixin_43794311/article/details/104981892