python achieve sampling distribution described

The use of wood provides data east lay the case, verify the data content distribution, reference link: https: //www.jianshu.com/p/6522cd0f4278, thanks in the top two.

Only posted code. . . The results did not got the picture

# Reading data 
DF = pd.read_excel ( 'C: //Users//zxy//Desktop//data.xlsx',usecols = [l, 2,3]) 


1. classified according to the port, the port various types of data calculated age, statistics ticket prices. 
= df.groupby DF1 ([ 'Embarked']) 
df1.describe () 

or 
# coefficient of variation = standard deviation / mean value 
DEF CV (Data): 
    return data.std () / data.var () 
DF2 = df.groupby ([ 'Embarked']). AGG ([ 'COUNT', 'min', 'max', 'Median', 'Mean', 'var', 'STD', CV]) 
DF2 = df2.apply (the lambda X : round (X, 2)) 
df2_age DF2 = [ 'Age'] 
df2_fare DF2 = [ 'Fare -'] 

# 2, drawn price distribution image, which subject to verification data distribution 
# 2.1 tickets histogram: 
plt.hist (DF [ 'Fare -'], 20 is, Normed =. 1, Alpha = 0.75) 
plt.title ( 'Fare -') 
PLT.

= stats.normaltest normaltest_test (DF [ 'Fare -'], Axis = 0) 
# above three test results showed p <5%, so the data were not normally distributed tickets. 


Fitting a normal distribution curve plotted #: 
Fare DF = [ 'Fare -'] 

plt.figure () 
fare.plot (kind = 'KDE') # normal raw data 

M_S = stats.norm.fit (fare) # fitting a normal distribution of the average loc, standard deviation Scale 
normalDistribution = stats.norm (M_S [0], M_S [. 1]) # fit a normal distribution plotted in FIG 
x = np.linspace (normalDistribution.ppf (0.01) , normalDistribution.ppf (0.99), 100) 
plt.plot (X, normalDistribution.pdf (X), C = 'Orange') 
plt.xlabel ( 'Fare - About Titanic') 
plt.title ( 'Titanic [Fare -] ON NormalDistribution ', size = 20 is) 
plt.legend ([' Origin ',' NormDistribution '


stats.t.rvs = X2 (DF DF =, = LOC LOC, Scale = Scale, size = len (Fare)) 
D, P = stats.ks_2samp (Fare, X2) 
#p <Alpha, reject the null hypothesis, price data does not meet the t-distribution. 

# Fares data distribution fitting T: 
plt.figure () 
fare.plot (kind = 'KDE') 
TDistribution = stats.t (T_S [0], T_S [. 1], T_S [2]) for drawing pseudo # T profile engaging 
X = np.linspace (TDistribution.ppf (0.01), TDistribution.ppf (0.99), 100) 
plt.plot (X, TDistribution.pdf (X), C = 'Orange') 
plt.xlabel ( 'Fare - the About Titanic') 
plt.title ( 'Titanic [Fare -] ON TDistribution', size = 20) 
plt.legend ([ 'Origin', 'TDistribution']) 

# verify compliance with the chi-square distribution? 
= stats.chi2.fit chi_S (Fare) 
df_chi chi_S = [0] 
loc_chi chi_S = [. 1] 
scale_chi = chi_S [2]
stats.chi2.rvs = X2 (DF = df_chi, LOC = loc_chi, Scale = scale_chi, size = len (Fare)) 
Dk, PK = stats.ks_2samp (Fare, X2) do not meet # 

# Chi-square data fares distribution fitting 
plt.figure () 
fare.plot (kind = 'KDE') 
chiDistribution = stats.chi2 (chi_S [0], chi_S [. 1], chi_S [2]) # fit a normal distribution plotted in FIG 
x = np.linspace (chiDistribution.ppf (0.01), chiDistribution.ppf (0.99), 100) 
plt.plot (X, chiDistribution.pdf (X), C = 'Orange') 
plt.xlabel ( 'Fare - About Titanic') 
PLT .title ( 'Titanic [Fare -] Chi-square_Distribution ON', size = 20 is) 
plt.legend ([ 'Origin', 'Chi-square_Distribution']) 

# classified according to the port, price verification between the two ports S and Q whether a difference obey certain distribution 
S_fare DF = [DF [ 'Embarked'] == 'S'] [ 'Fare -']  
Q_fare DF = [DF [ 'Embarked'] == 'Q '] [' Making ']
C_fare DF = [DF [ 'Embarked'] == 'C']['Fare']
S_fare.describe()

Port # in accordance with the classification, S port number of samples <= 554, Q port number of samples <= 28, C port number of samples <= 130. 
# Overall not normally distributed, it is necessary, when n is relatively large, generally require n> = 30, the difference between the two sampling distribution of the sample mean can be approximated to a normal distribution. 
# X2 sampling distribution overall capacity of 28, the sample size which can not exceed 30, so that the difference between S and Q Port Port mean of two samples (E (X1) -E (X2 )) is not normally distributed. 
S Port Port # and C mean of the difference between two samples (E (X1) -E (X3 )) sampling distribution approximated normal distribution, 
# whose mean and variance are E (E (X1) - E (X3) ) = E (E (X1) ) - E (E (X3)) = μ1 - μ3; D (E (X1) + E (X3)) = D (E (X1)) + D (E (X3)) = σ1² / n1 + σ3² / n3 . 

= np.mean Miu (S_fare) - np.mean (C_fare) 
SIG = np.sqrt (np.var (S_fare, ddof =. 1) / len (S_fare) + np.var (C_fare, ddof =. 1) / len ( C_fare)) 

X = np.arange (- 110, 50) 
Y = stats.norm.pdf (X, Miu, SIG) 
plt.plot (X, Y) 
plt.xlabel ( "S_Fare - C_Fare") 
plt.ylabel ( "Density")  
plt.title ( '
plt.show ()

  


 

Guess you like

Origin www.cnblogs.com/zym-yc/p/11444065.html