Basics of Analysis: The Idea of Averaging and Intersection

The content of this article is the third article in the series of introductory methods for data analysis, the idea of ​​averaging and crossover. If you need to review the previous two methods, the portal is here: "Group Analysis Method " " Comparative Analysis Method "

This article probably

1435

Character

Read all required

4

minute

The previous two articles mentioned comparison and grouping, which are often used in practical work, but you did not summarize it well. The average analysis method and cross analysis method to be shared today are also analysis methods that can be seen everywhere in work. , I hope we can learn and use it well.

1. Average Analysis

As the name implies, the average analysis method is to use the average number to reflect the level of the data under a certain characteristic. The average analysis is usually combined with the comparative analysis to measure the differences from multiple perspectives of time and space, and find the trends and laws.

01 The average number that has to be mentioned

The average is used to reflect the central tendency of a set of data, and the indicators that represent the average include the arithmetic mean, the geometric mean, the median, and the mode.

  • arithmetic mean

The most commonly used average number, also known as the mean value or average value, is the familiar formula for calculating the average value. All values ​​are added and then divided by the total number:

The arithmetic mean is greatly affected by extreme values. When extreme values ​​appear in the data set, the results obtained will have large deviations. For example, calculating the average income of an enterprise employee, because the boss’s income is too high, resulting in The value is pulled up, and the average value of the overall income is high. We often say that the salary income is averaged, which is the reason.

 

Use the AVERAGE() function in Excel to calculate

 

  • geometric mean

The geometric mean is widely used in the calculation of growth rate, rate of return and other ratios and indices, and is less affected by extreme values. The geometric mean is the nth root of the product of all values. When calculating the geometric mean, 0 and negative numbers are not allowed, the formula is:

Use the GEOMEAN() function to calculate the geometric mean in Excel

 

  • median

When talking about the arithmetic mean above, I gave an example of income being averaged. If the arithmetic mean cannot be used to describe the data due to the appearance of outliers, what indicator should be used to describe it? By the way, it is the median and mode.

 

The median is to arrange the data in ascending order, and the middlemost data is the median.

 

How to find the median:

When the number of data is odd, the median is the middle number; when the number of data is even, the median is the average of the two middle numbers. The median is not affected by extreme values ​​and therefore lacks sensitivity to extreme values.

 

Use the MEDIAN() function to calculate the median in Excel

 

  • mode

The mode is the number that occurs most often in the data, that is, the value with the greatest frequency. There may be more than one mode in a set of data. The mode can be used not only for numerical data, but also for non-numeric data, and is not affected by extreme values. The mode is usually used to reflect the general level of a set of data, such as the concentration level of students in a certain exam, the average living standard of urban residents, etc.

 

Use the MODE() function to calculate the mode in Excel

 

02 Application of average analysis method

The comparison of the same average index between different competitive products in the same industry can be used to compare the overall level of events. The figure below shows the comparison of the average daily usage times and per capita daily usage time of Taobao, Pinduoduo, and Vipshop in 2018 .

(See the data source in the lower right corner of the picture, Invasion and Deletion)

 

The following is an example of the average salary of employees in a company

 

By drawing the histogram of income distribution, we found that the most employees earn between 2000 and 4000, which is too far from the average value of 8203. This is caused by several outliers with income above 20000, so the average value cannot be used to explain the problem.

 

 

2. Cross Analysis

01 Significance of cross analysis

Crossover analysis is to cross two or more indicators to find the relationship between variables and discover the characteristics of the data. As shown in the figure below, there is a sales data of a certain chain store. The original data table has five dimensions: year, month, sales area, sales quantity and selling price. It can be combined in pairs to get some cross-relational ideas. Year & Sales, Year & Selling Price, Region & Sales, Region & Selling Price, etc. If we cross each field in pairs, we can get 10 cross relationships. It should be noted that these cross relationships must have In practical terms, such as the intersection of year and month, nothing can be analyzed, and it is meaningless.

 

【annual sales】

Through the intersection between the year and the sales volume, the sales volume in 2010 is higher than that in 2009.

 

【Region & Sales】

Through the cross-analysis of regions and sales volume, it is found that Shenyang has the best sales volume and Shanghai has the worst sales volume.

 

[3 Dimensions Intersection]

In addition to pairwise crossing, multiple crossings can also be used, such as the relationship between region & sales & year.

 

【Multiple Dimensions Intersection】

Cross-relationship of the four dimensions of region & sales volume & year & selling price

 

02 Application of cross analysis

The monthly activity of Pinduoduo in different time dimensions shown in the figure below, as well as the proportion of overlapping users with Taobao and JD.com, can be analyzed to understand the comparison between different periods and competing products.

(See the data source in the lower right corner of the picture, invaded and deleted)

 

    summary   

The most used tool for cross analysis is the pivot table

l To be able to distinguish between meaningful crossovers and meaningless crossovers

l To find the point of intersection

This series will write some entry-level analysis methods. The use of tools is similar. Different thinking determines individual differences. I hope everyone can get the mystery of it.


Provide learning route planning for entry-level data analysis, and share dry goods from Excel to statistics. Data analysis is a skill, and everyone is expected to be able to analyze data.

related information:

SQL Learning: Getting Started with MySQL  |  Addition , deletion, and modification of libraries/tables/records

Excel analysis methods: time series analysis  |  regression analysis  |  descriptive statistics analysis  |  correlation analysis 

Excel Charts: Data Maps  |  Pivot Tables  |  5 Basic Charts  |  13 Advanced Charts  |  Histograms  |  Control Charts  |  Arrangement Charts

Excel function: date text function  |  find reference function  |  if function  |  statistical function

Data Analysis with Excel: Data Acquisition  |  Data Processing 

Methodology: How to systematically learn Excel  |  Learning Data Analysis  |  Skills of Excel 

If you find it useful, please click on the lower right corner  to  watch

Guess you like

Origin blog.csdn.net/data_cola/article/details/116026111