The use of wood provides data east lay the case, verify the data content distribution, reference link: https: //www.jianshu.com/p/6522cd0f4278, thanks in the top two.
Only posted code. . . The results did not got the picture
# Reading data DF = pd.read_excel ( 'C: //Users//zxy//Desktop//data.xlsx',usecols = [l, 2,3]) 1. classified according to the port, the port various types of data calculated age, statistics ticket prices. = df.groupby DF1 ([ 'Embarked']) df1.describe () or # coefficient of variation = standard deviation / mean value DEF CV (Data): return data.std () / data.var () DF2 = df.groupby ([ 'Embarked']). AGG ([ 'COUNT', 'min', 'max', 'Median', 'Mean', 'var', 'STD', CV]) DF2 = df2.apply (the lambda X : round (X, 2)) df2_age DF2 = [ 'Age'] df2_fare DF2 = [ 'Fare -'] # 2, drawn price distribution image, which subject to verification data distribution # 2.1 tickets histogram: plt.hist (DF [ 'Fare -'], 20 is, Normed =. 1, Alpha = 0.75) plt.title ( 'Fare -') PLT. = stats.normaltest normaltest_test (DF [ 'Fare -'], Axis = 0) # above three test results showed p <5%, so the data were not normally distributed tickets. Fitting a normal distribution curve plotted #: Fare DF = [ 'Fare -'] plt.figure () fare.plot (kind = 'KDE') # normal raw data M_S = stats.norm.fit (fare) # fitting a normal distribution of the average loc, standard deviation Scale normalDistribution = stats.norm (M_S [0], M_S [. 1]) # fit a normal distribution plotted in FIG x = np.linspace (normalDistribution.ppf (0.01) , normalDistribution.ppf (0.99), 100) plt.plot (X, normalDistribution.pdf (X), C = 'Orange') plt.xlabel ( 'Fare - About Titanic') plt.title ( 'Titanic [Fare -] ON NormalDistribution ', size = 20 is) plt.legend ([' Origin ',' NormDistribution ' stats.t.rvs = X2 (DF DF =, = LOC LOC, Scale = Scale, size = len (Fare)) D, P = stats.ks_2samp (Fare, X2) #p <Alpha, reject the null hypothesis, price data does not meet the t-distribution. # Fares data distribution fitting T: plt.figure () fare.plot (kind = 'KDE') TDistribution = stats.t (T_S [0], T_S [. 1], T_S [2]) for drawing pseudo # T profile engaging X = np.linspace (TDistribution.ppf (0.01), TDistribution.ppf (0.99), 100) plt.plot (X, TDistribution.pdf (X), C = 'Orange') plt.xlabel ( 'Fare - the About Titanic') plt.title ( 'Titanic [Fare -] ON TDistribution', size = 20) plt.legend ([ 'Origin', 'TDistribution']) # verify compliance with the chi-square distribution? = stats.chi2.fit chi_S (Fare) df_chi chi_S = [0] loc_chi chi_S = [. 1] scale_chi = chi_S [2] stats.chi2.rvs = X2 (DF = df_chi, LOC = loc_chi, Scale = scale_chi, size = len (Fare)) Dk, PK = stats.ks_2samp (Fare, X2) do not meet # # Chi-square data fares distribution fitting plt.figure () fare.plot (kind = 'KDE') chiDistribution = stats.chi2 (chi_S [0], chi_S [. 1], chi_S [2]) # fit a normal distribution plotted in FIG x = np.linspace (chiDistribution.ppf (0.01), chiDistribution.ppf (0.99), 100) plt.plot (X, chiDistribution.pdf (X), C = 'Orange') plt.xlabel ( 'Fare - About Titanic') PLT .title ( 'Titanic [Fare -] Chi-square_Distribution ON', size = 20 is) plt.legend ([ 'Origin', 'Chi-square_Distribution']) # classified according to the port, price verification between the two ports S and Q whether a difference obey certain distribution S_fare DF = [DF [ 'Embarked'] == 'S'] [ 'Fare -'] Q_fare DF = [DF [ 'Embarked'] == 'Q '] [' Making '] C_fare DF = [DF [ 'Embarked'] == 'C']['Fare'] S_fare.describe() Port # in accordance with the classification, S port number of samples <= 554, Q port number of samples <= 28, C port number of samples <= 130. # Overall not normally distributed, it is necessary, when n is relatively large, generally require n> = 30, the difference between the two sampling distribution of the sample mean can be approximated to a normal distribution. # X2 sampling distribution overall capacity of 28, the sample size which can not exceed 30, so that the difference between S and Q Port Port mean of two samples (E (X1) -E (X2 )) is not normally distributed. S Port Port # and C mean of the difference between two samples (E (X1) -E (X3 )) sampling distribution approximated normal distribution, # whose mean and variance are E (E (X1) - E (X3) ) = E (E (X1) ) - E (E (X3)) = μ1 - μ3; D (E (X1) + E (X3)) = D (E (X1)) + D (E (X3)) = σ1² / n1 + σ3² / n3 . = np.mean Miu (S_fare) - np.mean (C_fare) SIG = np.sqrt (np.var (S_fare, ddof =. 1) / len (S_fare) + np.var (C_fare, ddof =. 1) / len ( C_fare)) X = np.arange (- 110, 50) Y = stats.norm.pdf (X, Miu, SIG) plt.plot (X, Y) plt.xlabel ( "S_Fare - C_Fare") plt.ylabel ( "Density") plt.title ( ' plt.show ()