Histogram curve fitting

Data in a given set of continuous values, they are divided into several small pieces, count the number of each of the small pieces of data and draw their histograms and fitting curves.

Method a: can be implemented quickly using seaborn package, where the fitted curve instead of the default normal curve, but better fit the data distribution, but can be set by the parameter fitting normal curve fit.

Import Seaborn AS SNS
 Import matplotlib.pyplot AS PLT
 Import numpy AS NP 
sns.set (style = " ticks " )
 from sklearn Import Datasets
 from scipy.stats Import NORM 

IRIS = datasets.load_iris ()    # Loading iris data set 
x = iris.data [:, 0]             # fetch the first column narry 
sns.set_palette ( " hls " )         # set all color chart using the color space hls 
# sns.distplot (X, color = "R & lt", bins = 100, kde = True,) # hist = False) 
#hist and kde default parameters are True, respectively, and for controlling whether or not to show the histogram graph fitting 
# Fit specifying fit normal distribution can be used to import Import NORM scipy.stats from 
sns.distplot (X, bins = 30, a hist = True, kde_kws = { ' Color ' : ' Green ' , ' LW ' :. 3, ' label ' : ' X ' }, hist_kws = { ' Color ' : ' Red ' , ' Alpha ' : 0.2 }) 
plt.show ()
View Code

Official website tutorial: http://seaborn.pydata.org/generated/seaborn.distplot.html?highlight=distplot#seaborn.distplot

Reference: https://www.jianshu.com/p/65395b00adbc

Act II: the use of round () function after one or two decimal places, and then groupby plot, but the effect is far better than the first.

f_train['VAR00007'] = f_train['VAR00007'].apply( lambda x: round(x, 1))
f_train = f_train.groupby(['VAR00007'])['VAR00007'].agg(['count']).reset_index()
f_train.sort_values(['VAR00007'], )
ydata = f_train['VAR00007'].tolist()
x = f_train['count'].tolist()
ydata.sort(reverse=False)
plt.scatter( ydata, x)
plt.show()
View Code

 

Guess you like

Origin www.cnblogs.com/xxswkl/p/11184267.html