Data in a given set of continuous values, they are divided into several small pieces, count the number of each of the small pieces of data and draw their histograms and fitting curves.
Method a: can be implemented quickly using seaborn package, where the fitted curve instead of the default normal curve, but better fit the data distribution, but can be set by the parameter fitting normal curve fit.
Import Seaborn AS SNS
Import matplotlib.pyplot AS PLT
Import numpy AS NP
sns.set (style = " ticks " )
from sklearn Import Datasets
from scipy.stats Import NORM
IRIS = datasets.load_iris () # Loading iris data set
x = iris.data [:, 0] # fetch the first column narry
sns.set_palette ( " hls " ) # set all color chart using the color space hls
# sns.distplot (X, color = "R & lt", bins = 100, kde = True,) # hist = False)
#hist and kde default parameters are True, respectively, and for controlling whether or not to show the histogram graph fitting
# Fit specifying fit normal distribution can be used to import Import NORM scipy.stats from
sns.distplot (X, bins = 30, a hist = True, kde_kws = { ' Color ' : ' Green ' , ' LW ' :. 3, ' label ' : ' X ' }, hist_kws = { ' Color ' : ' Red ' , ' Alpha ' : 0.2 })
plt.show ()
Official website tutorial: http://seaborn.pydata.org/generated/seaborn.distplot.html?highlight=distplot#seaborn.distplot
Reference: https://www.jianshu.com/p/65395b00adbc
Act II: the use of round () function after one or two decimal places, and then groupby plot, but the effect is far better than the first.
f_train['VAR00007'] = f_train['VAR00007'].apply( lambda x: round(x, 1))
f_train = f_train.groupby(['VAR00007'])['VAR00007'].agg(['count']).reset_index()
f_train.sort_values(['VAR00007'], )
ydata = f_train['VAR00007'].tolist()
x = f_train['count'].tolist()
ydata.sort(reverse=False)
plt.scatter( ydata, x)
plt.show()