Statistical Analysis_Frequency Analysis

Frequency analysis -use a certain classification method to classify the array, count the number of samples under each group, and assist with charts to describe the distribution trend of the array in a more intuitive way.

Business significance: In the actual data analysis work, it is often necessary to segment the data according to a certain dimension for index statistics to find problems and solve problems.

 

Example: 40 students in a class, the test results are as follows:

[73,87,88,65,73,76,80,95,83,69,55,67,70,94,86,81,87,95,84,92,92,76,69,97,72,90,72,85,80,83,97,95,62,92,67,73,91,95,86,77]

Central tendency

print ('Mean:' + str (round (np.mean (score), 1)), 'Median:' + str (np.median (score)), ' Mode :' + str (stats.mode (score) [0] [0]) ) 

Mean: 81.3 Median: 83.0 Mode: 95

Degree of dispersion

print ('Highest score:' + str (max (score)), 'Lowest score:' + str (min (score)), 'Poor range:' + str (max (score) -min (score)), ' 
Interquartile range : '+ str ( np.quantile (score, 0.75) -np.quantile (score, 0.25)),' Variance: '+ str (round (np.var (score), 1)),' Variance : '+ str (round (np.std (score), 1))) 

Highest score: 97 Lowest score: 55 Range: 42 Interquartile range: 18.5 Variance: 118.1 Variance: 10.9
  • The highest score is 97, and the mode is 95, indicating that the student scores are concentrated in high segments, which reflects that the test paper is not difficult
  • The difference between the lowest score and the mean is 81.3-55 = 26.5, and the deviation is large. The student needs to pay attention to it.

In addition to the above information, if only rely on the basis of data indicators, it is difficult to make a comprehensive interpretation of the data .

 

Data Frequency Table -According to some dimensions of data, the array is segmented for statistics.

bins = np.arange(55,101,5) #成绩分段
bins = pd.cut(df,bins,include_lowest = True,right = False)
bins_score = df.groupby(bins)
bins_score.count()

[55, 60)     1
[60, 65)     1
[65, 70)     5
[70, 75)     6
[75, 80)     3
[80, 85)     6
[85, 90)     6
[90, 95)     6
[95, 100)    6
dtype: int64

 From the above table, it can be intuitively found that the number of students in each segment is relatively average, indicating that the exam did not widen the gap and formed a two-level differentiation. In addition, some students have low scores and need additional attention.

 

Frequency histogram

plt.hist(score,bins = 9)
plt.show()

 

Box plot

plt.boxplot(score)
plt.show()

 

summary

In data analysis, the most important thing is not the frequency analysis method, but the classification ideas embodied in the frequency analysis, the data dimensions are segmented statistics, more intuitive observation of data to find problems.

 

2020-04-15 01:54

 

Guess you like

Origin www.cnblogs.com/fuyusheng/p/12709980.html