The application of central tendency-to identify "seed players" based on central tendency

Content import:

Hello everyone, here is a little bit of analysis every day. This issue introduces you to the basic series of data analysis, which mainly introduces the principles of descriptive statistical analysis, including median, mode, mean, variance, standard deviation, dispersion coefficient, skewness, kurtosis, outliers, etc. The principle, concept and application of Combined with the analysis of the athletes' case, discuss the selection of athletes in different scenarios, and explain the reasons based on the calculation results of the central tendency index. The content of the article is suitable for data analysis novices, the content is in-depth and simple, and the case fits the actual situation. In the next issue, I will introduce you to the application of discrete trends, and welcome your attention.

Concept introduction:

The concept of central tendency:

The central tendency of data refers to the degree to which the data moves to a certain center, reflecting the location of a group of data center points. It is used to reflect the general level of data. Commonly used indicators include average, median, quantile and mode. Generally speaking, the words "most people" and "most situations" that we often mention in our mouths reflect the concentration of data.

The principle of the average:

The average value refers to the result obtained by adding a set of data and dividing by the number of data. It is the most important measure of central tendency. It is mainly suitable for numerical data, not for categorical data and sequential data. It is used for statistical analysis and The basis of statistical inference, from a statistical point of view, the average is the center of gravity of a set of data, and is the inevitable result of data errors canceling each other out. The average is divided into simple average, weighted average and harmonic average.

The principle of median:

The median is the variable value in the middle position of a group of data after sorting, and the whole data is equally divided into two parts. Each part contains 50% of the data and is not suitable for categorical data. The median is a representative value of the location and is not affected by extreme values ​​in the data.

Principle of Mode:

The mode is the variable value that appears most frequently in a set of data. Generally, the mode is meaningful when the amount of data is large. The mode is a representative value of a position and is not affected by the extreme values ​​in the data. The mode may Does not exist; there may also be multiple modes.

Special Note:

The average value is easily affected by the extreme value. When the extreme value appears in the data set, the obtained average result will have a large deviation. The median is not affected by extreme values, so it lacks sensitivity to extreme values. There may be more than one mode. Mode can be used not only for numerical data, but also for non-numerical data, and is not affected by extreme values.

Calculation and application method:

Calculation and application of average:

1. Simple average: data and value divided by the number of data.

Calculation formula:

 

Calculation example: Suppose the age of a class is 10,11,12,13,14,15,16,17, now calculate the average age of this class mean=(10+11+12+13+14+15+16+17 )/8=13.5

2. Weighted average: data multiplied by weight divided by the number of data.

Calculation formula:

 

Calculation example: According to statistics, the ages of students in our school are 10,11,12,13,14,15,16,17, and the number of students corresponding to 5,6,7,8,9,10,11,12, Find the average age of the whole grade. weight_mean=(10*5+11*6+12*7+13*8+14*9+15*10+16*11+17*12)/(5+6+7+8+9+10+11 +12) = 14.12

3. Geometric mean: multiply n data, and then raise the nth power.

Calculation formula:

 

Calculation example: the distribution of stock returns is 10,11,12,13,14,15,16,17, calculate the geometric mean. add_mean=(10*11*12*13*14*15*16*17) open 8th power=13.3

4. The specific application of the average value.

The average value is usually used to measure the overall level of a pair of things, such as judging the overall well-off situation through the national average income, looking at the regional development through the average regional income, and measuring the overall level of the class through the average class score. For specific applications, we will introduce to you through a lively video.

The calculation and application of the mode:

1. Calculation of Mode

Mode, direct statistics of the frequency of data, the number with the highest frequency is called the mode.

Calculation example: The age of a class is as follows: 17, 11, 15, 13, 13, 13, 14, 12, 12, 12, 12, 10, 16. The count found that the number 13 appears 4 times; the number 12 The number appears 4 times. Therefore, the mode of the data is 12 and 13.

2. The specific application of mode

Mode is usually used to measure the general level of a thing, and has guiding significance. For example, the number of people eating around 12 o'clock is the largest, and the restaurant needs to grasp the time to prepare more ingredients; the subway is the most crowded at 7-8 o'clock, and travel avoids the morning peak; Japan has the largest proportion of people over 50 years old, and the population is seriously aging. The mathematics score of Xueba is usually between 135-145, and the test score is around 140 this time. For specific application expansion, let’s take a look at our small video, and hope to give you new insights.

Calculation and application of median:

1. Calculation of median

The calculation of the median is divided into two cases. Sort the data first, and take the number in the middle after sorting.

 

 

Calculation example: The age of a class is as follows: 17,11,15,13,14,12,10,16, sort first, become 10,11,12,13,14,15,16,17, the number of data bits It is 8, the median of even-digit data is the average of the fourth and fifth digits median=(13+14)/2, and the median is 13.5.

2. Specific application of median

The median and its derivative applications are very extensive. First, the median divides the data into two equal parts, which can be extended to multi-quantile classification to classify and select data. The actual application of similar awards and grades Classification. In addition, the median is also a ranking information, you can view the individual's position in the overall. Finally, some distributions, such as the median of the normal distribution, represent the central tendency of the data. For more examples, we will give you answers in the video.

Comprehensive application scenarios:

Both athletes of A and B are intermediate level, each hitting targets 8 times in a row. Which athlete has a high overall level?

If there is a provincial match and the opponent is weak, who will you send to choose between A and B?

If you play a national game and your opponent is stronger, who will you send to choose between A and B?

Players A and B hit 8 consecutive targets, recorded in the order as follows:

Athlete: [8,7,8,9,9,8,7,8].

Player B: [5,6,6,7,7,10,10,10].

Through calculation, the results of central tendency obtained are shown in the following table:

1. Which athlete has a high overall level?

Since the distribution of the data is unknown, the median and mode do not necessarily represent the overall level. From the average point of view, the average value of A is 8, and the average of B is 7.625, so the overall level of A is relatively high.

2. If a provincial game is played and the opponent is weak, who will you send to choose between A and B?

In a provincial competition, the opponents are weaker, and the chances of A and B are higher than other athletes. Therefore, the game should be stable and choose a higher level A.

3. If there is a strong opponent in a national level game, who will you send to choose between A and B?

If there is a national level game, the opponents are strong, and they are all above A and B, so there is no point in choosing a high level A to go. On the contrary, looking at B’s data, the median is the same as A, and the average is lower than A, but his mode 10 is much greater than A. If he performs supernormally, there is a probability that he will be ranked. Regardless of the sequence of B’s performance, starting from the beginning, B has a lot of room for improvement. Through training, he may achieve better results.

Of course, it makes sense for you to choose A. From the perspective of B's ​​performance, B's psychological quality is poor, and he may become more nervous in the game, resulting in abnormal performance. Therefore, it is too ugly to lose even if you choose A.

 

Everyone, did you enjoy watching today? It's not over yet, we have prepared relevant python code cases of centralized trends for everyone, as a small gift to everyone, for more content, please pay attention to the sea data public account.

If you have any suggestions, such as the knowledge you want to know, the problems in the content, the materials you want, the content to be shared next time, and the problems encountered in learning, please leave a message below. Please pay attention if you like it.

Follow code

http://weixin.qq.com/r/40Q9Jd-EHoJhrZtG9xHx  (Automatic identification of QR code)

 

Share this issue here, we will continue to update every week, we will see you next issue, and look forward to your visit.

Guess you like

Origin blog.csdn.net/qq_40433634/article/details/108803593