Part of the data
1. Change the date on the card type time
sec_cars.Boarding_time = pd.to_datetime(sec_cars.Boarding_time,format='%Y年%m月')
The first parameter is a function #to_datetime incoming date (and the date for obtaining the original format of the original data) ,, second parameter is the format used to format
2. Change the price of new cars float
sec_cars.New_price = sec_cars.New_price.str[:-1].astype('float')
NOTE: astype () is a function of the type cast
Whether the skewness and kurtosis 3. disposable statistical numerical variables, and the data are "heavy tail spike" feature
#挑出所有数值型变量
num_variables = sec_cars.columns[sec_cars.dtypes!='object'][1:]
[! sec_cars.dtypes = 'object'] means no reading field type is a field object, returns a list of fields satisfies the condition;
[1:] represents a content list acquiring the second to the last from the list, that is, to achieve the purpose of removing Boarding_time this date field.
Define a function to calculate skewness and kurtosis
def skew_kurt(x):
skewness = x.skew() #计算偏度值
kurtsis = x.kurt() #计算峰度值
# return pd.Series([skewness,kurtsis],index=['Skew','kurt'])
pd.Series([skewness,kurtsis],index=['Skew','kurt'])
#返回偏度值和峰度值组成的
#print(sec_cars[num_variables].apply(func=skew_kurt,axis=0))#对数值型数据和刚刚的字段进行调用函数计算
4. For character data
print (sec_cars.describe (include = [ ' object'])) # of data for statistical character described.
In general, recently wrote an article a little absent-minded, too many things, writing articles quality is not good, then have to correct the wrong job.