How to Do a Pareto Analysis

5e8922c8ae86a71f732a24b80780e344.jpeg

[Interview questions] There is a "Student Transcript Sheet", which contains 3 fields: student number, course, and grade.

1e47edf2bad20423edc7ee7bb9299e51.png

Question: Find out the students of Class A and Class B of each course, the judgment standard is the cumulative proportion, 0~60% is recorded as Class A, and 60%~85% is recorded as Class B

【Problem solving ideas】

What is the 28th rule?

The 80/20 rule means that in any set of things, the most important thing only accounts for a small part, about 20%. For example, in a store, only 20% of the products are sold

What is the ABC Taxonomy?

The ABC classification method is a classification method derived from the 28th law. Because it divides objects into three categories: A, B, and C, it is called ABC classification, also known as Pareto analysis.

ABC classification calculation steps:

1) Sort the analysis objects from large to small

2) Calculate the cumulative proportion of each object and the object before it

3) Record the cumulative proportion of 0-60% as Class A, 60%-85% as Class B, and more than 85% as Class C

1. Problem-solving ideas

Topic requirements: find out the students in category A and category B of each course, the judgment standard is the cumulative proportion, 0~60% is recorded as category A, 60%~85% is recorded as category B;

Therefore, the core problem is to calculate the cumulative proportion.

So, what is the cumulative ratio?

Cumulative percentage of course A = cumulative grade of course A / total grade of course A

"Course total grade" is easy to understand, that is, the sum of the grades of all students in each course.

The definition of "Course Cumulative Grade" is:

1) The grades of students in each course are sorted from big to small;

2) Calculate the cumulative grades of each student and the courses before the student.

For example, in the math courses in the table below, the grades in descending order are 96, 65, 55. The cumulative score of the mathematics course of the student number (S002) is 96, the cumulative score of the mathematics course of the student number (S001) is 96+65=161, and so on.

56f4634081bb1a85b26563de055796d1.png

2. Cumulative course grades

The cumulative problem should be solved with window functions.

select *,
       sum(成绩) over (partition by 课程 
                       order by 成绩 desc 
                       rows between unbounded preceding and current row) as 课程累计成绩
from 学生成绩表;

search result:

fc30042b74a7aebe988a2573a4f868a1.png

Name the query result of this SQL query as subquery t1.

The rows between ... and ... usage of the window function is used here. The meaning is to sum field 1 from "Start Row" to "End Row".

sum(字段1) over (partition by 字段2 
                 order by 字段3 
                 rows between 起始行and 终止行)

For this question, it is required to get "the cumulative grades of each student and the courses before the student", so the "start line" is the first line (unbounded preceding) of each window, and the "end line" is the current line ( current row).

3692082d43b181aaedf59b8123c23ec7.png

3. Overall course grade

According to the definition of indicators: the cumulative proportion of course A = the cumulative score of course A / the total score of course A.

Get the numerator in front: the cumulative grade of each course.

Also need to get the denominator: the total course grade for each course.

The total course score of each course, related to "each" should think of using "summary analysis", group by course (group by), summary (job search results and sum)

select 课程,sum(成绩) as 课程总成绩
from 学生成绩表
group by 课程;

search result:

72925744c6f554b3e53e49f0fb4ae11c.png

Name the query result of this SQL query as subquery t2.

3. Cumulative proportion

According to the definition of indicators: the cumulative proportion of course A = the cumulative score of course A / the total score of course A.

In order to facilitate the calculation, it is necessary to summarize the results obtained in the above two steps into a table.

Record the query results of the cumulative grades of each course obtained in the first step as table t1, and the query results of the total grades of each course obtained in the second step as table t2, and perform multi-table joins.

969ed051d6bcc1a15d2d5070a36d41db.png

select t1.学号,
       t1.课程,
       t1.成绩,
       t1.课程累计成绩,
       t2.课程总成绩,
       t1.课程累计成绩/2.课程总成绩 as 累计成绩占比
from  t1
left join t2 
on t1.课程 = t2.课程;

Substituting the subqueries t1 and t2 in steps 1 and 2 into the above SQL statement, we get:

select t1.学号,
       t1.课程,
       t1.成绩,
       t1.课程累计成绩,
       t2.课程总成绩,
       t1.课程累计成绩/t2.课程总成绩 as 累计成绩占比
from (
select *,
       sum(成绩) over (partition by 课程 
                       order by 成绩 DESC 
                       rows between unbounded preceding and current row) as 课程累计成绩
from 学生成绩表
) as t1
left join (
select 课程,sum(成绩) as 课程总成绩
from 学生成绩表
group by 课程
) as t2 
on t1.课程 = t2.课程;

search result

a067e9cc87aa1c80cca9118e1dc97cc6.png

Name the query result of this SQL query as subquery t3

4. Classification

The requirement of the title is: to find out the students of class A and class B of each course, the judgment standard is the cumulative proportion, 0~60% is recorded as class A, and 60%~85% is recorded as class B

select t3.学号,
       t1.课程,
       t1.成绩,
       case when t3.累计成绩占比 > 0 and t3.累计成绩占比 <= 0.6 then 'A'
                 t3.累计成绩占比 > 0.6 and t3.累计成绩占比 <= 0.85 then 'B'
                 end as 类别
from t3
where t3.累计成绩占比 <= 0.85;

Substituting the subquery t3 in step 3 into the above SQL statement, we get:

select t3.学号,
       t3.课程,
       t3.成绩,
       case when t3.累计成绩占比 > 0 and t3.累计成绩占比 <= 0.6 then 'A'
            when t3.累计成绩占比 > 0.6 and t3.累计成绩占比 <= 0.85 then 'B'
            end as 类别
from (
select t1.学号,
       t1.课程,
       t1.成绩,
       t1.课程累计成绩,
       t2.课程总成绩,
       t1.课程累计成绩/t2.课程总成绩 as 累计成绩占比
from (
select *,
       sum(成绩) over (partition by 课程 
                       order by 成绩 DESC 
                       rows between unbounded preceding and current row) as 课程累计成绩
from 学生成绩表
) as t1
left join (
select 课程,sum(成绩) as 课程总成绩
from 学生成绩表
group by 课程
) as t2 
on t1.课程 = t2.课程
) as t3
where t3.累计成绩占比 <= 0.85;

ff6286681abdc1902415344e755448ea.png[Test points for this question]

1. Examine the understanding of Pareto analysis ideas;

2. Examine the understanding of window functions and use them flexibly to solve business problems;

3. Examine the understanding of multi-table joins.

b31894a9e1cbb5e2a906fe6a79b2bc31.jpeg

 ⬇️Click "Read the original text"

 Sign up for free Data analysis training camp

Guess you like

Origin blog.csdn.net/zhongyangzhong/article/details/130234562