Exception handler
A first data calculated upper and lower limits, it is determined whether the data is within this range, and other operations can be replaced
Common computing functions:
Quantile:. Df height .quantile (0.5) # half quantile, which is the median
median:. Df height .median ()
averages:. Df height .mean ()
standard deviation:. Df Height .std ()
function describing:. df height .describe ()
Determining whether there is an abnormal value any ()
import pandas as pd
import numpy as np
df=pd.read_csv('test_innom.csv',encoding='gbk')
print(df.身高)
df_mean=df.身高.mean()
df_std=df.身高.std()
min_da=df_mean-df_std
max_da=df_mean+df_std
print(any((df.身高< min_da )| (df.身高 > max_da) ))
for x in df.身高:
if (x< min_da )| (x > max_da) :
print("异常值%d"%x)
print(df_mean)
print(df_std)
print(df.身高.describe())
print(df.身高.median())
print(df.身高.quantile(0.5))
Alternatively using the maximum range values
Use loc functions need to modify the position data, the data in the assigned replacement, the following example will use the data beyond the maximum value does not exceed the maximum data substitution
import pandas as pd
import numpy as np
df=pd.read_csv('test_innom.csv',encoding='gbk')
print(df.身高)
df_mean=df.身高.mean()
df_std=df.身高.std()
min_da=df_mean-df_std
max_da=df_mean+df_std
print(any((df.身高< min_da )| (df.身高 > max_da) ))
for x in df.身高:
if (x< min_da )| (x > max_da) :
print("异常值%d"%x)
rep_val_max=df.身高[df.身高<max_da].max()
df.loc[df.身高>max_da,'身高']=rep_val_max
print(df.身高)