Oracle counts annual employee information for each company - an alternative "twice grouping"

1 Introduction

I encountered a problem yesterday 统计. The data is about 14w100%. In the Excelfile, I originally planned to use the python I learned before to practice my hands. Later, when the time was tight, I directly imported it into the Oracle database for statistics. The process of solving is very interesting. It is hereby recorded as follows:

2. Demand

Information about employees of more than 3,000 companies from 2008 to 2016; there are 14warticles of data; stored in Excel files;

To protect privacy, the data here is simulated data

Required statistics:
1. The number of employees in
each company per year 2. The number of male and female employees in
each company per year 3. The average age of employees in each company per year

The sql statement is at the end, if you need it urgently, you can skip
to the end. The
sql statement is at the end, if you need it urgently, you can jump to the end.

3. Preliminary knowledge

3.1 case general statement

  • The case statement has a selection effect to return the first statement that meets the requirements, that is, the judgment of the first statement and the second statement is true, and the first one is returned.
  • The syntax of the case varies depending on where it is placed.
CASE    SELECTOR
    WHEN   EXPRESSION_1 THEN STATEMENT_1;
    [WHEN EXPRESSION_2 THEN STATEMENT_2;]
    [...]
    [ELSE STATEMENT_N+1 ;]
END CASE;
  • Note the need after then; semicolon, and END CASE at the end;
CASE v_element
    WHEN  xx  THEN yy;
    WHEN  xxx THEN  yyy;
    ELSE  yyyy;
END CASE;

When v_element is equal to xx, execute the yy statement. If it is very long, you can add begin and end before and after. The judgment condition is v_element = xx, and xx is the specific value.

3.2 The search case statement

CASE 
    WHEN SEARCH_CONDITION_1 THEN STATEMENT_1;
    [WHEN SEARCH_CONDITION_1 THEN STATEMENT_2;]
    [...]
    [ELSE STATEMENT_N+1 ;]
END CASE;
CASE 
    WHEN  v_element=xx  THEN yy;
    WHEN  v_element=xxx THEN  yyy;
    ELSE  yyyy;
END CASE;

4. Problem Analysis

The difficulty of solving the problem lies in:每个公司每年的

4.1 Normal "twice grouping"

The more intuitive solution is: first according to the 公司grouping, and then according to the grouping and adding constraints at the same time to count

The results of the statistics are as follows:

4.2 "Twice Grouping of Curves"

If a company's information is required to be counted as one piece of data

At this time, it can only be grouped according to the company first. After grouping, the case statement "curve to achieve grouping" effect is used in the group.

5. Problem solving

5.1 Environment introduction

1.Oracle 11g 64位
2.PLSQL 11.0.3.1700
3.Microsoft Office 2013

5.2 Create table

create table gsygxx
(
  gsdm VARCHAR2(16),
  tjsj VARCHAR2(64),
  xm   VARCHAR2(32),
  xb   VARCHAR2(8),
  nl   NUMBER
)

5.3 Excel data import

5.4 Normal "twice grouping" source code

First, according to the 公司grouping, and then according to the employee information obtained by the grouping, directly add constraints, and you can count the results每个公司每年

select 
gsdm 公司代码,
tjsj 统计时间,

count(*) 总人数,
avg  (nl) 平均年龄,

count(CASE WHEN xb='女' THEN 1    ELSE NULL   END) 女,
count(CASE WHEN xb='男' THEN 1    ELSE NULL   END) 男

from gsygxx
group by gsdm,tjsj
order by gsdm,tjsj;

5.5 "Curve Twice Grouping" Source Code

1. All employee information of all years 公司is obtained according to the grouping 每个公司: when adding constraints, additional year restrictions need to be added.
2. In order to meet the requirements 一个公司的信息统计为一条数据, the data of different years needs to be counted in one sql statement.

select 
gsdm 公司代码,
count(*) 总人数,

avg   (CASE WHEN tjsj = '2008-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2008,
count (CASE WHEN tjsj = '2008-12-31' THEN 1    ELSE NULL   END) 总人数_2008,
count (CASE WHEN tjsj = '2008-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2008_男,
count (CASE WHEN tjsj = '2008-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2008_女,

avg   (CASE WHEN tjsj = '2009-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2009,
count (CASE WHEN tjsj = '2009-12-31' THEN 1    ELSE NULL   END) 总人数_2009,
count (CASE WHEN tjsj = '2009-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2009_男,
count (CASE WHEN tjsj = '2009-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2009_女,


avg   (CASE WHEN tjsj = '2010-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2010,
count (CASE WHEN tjsj = '2010-12-31' THEN 1    ELSE NULL   END) 总人数_2010,
count (CASE WHEN tjsj = '2010-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2010_男,
count (CASE WHEN tjsj = '2010-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2010_女,

avg   (CASE WHEN tjsj = '2011-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2011,
count (CASE WHEN tjsj = '2011-12-31' THEN 1    ELSE NULL   END) 总人数_2011,
count (CASE WHEN tjsj = '2011-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2011_男,
count (CASE WHEN tjsj = '2011-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2011_女,

avg   (CASE WHEN tjsj = '2012-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2012,
count (CASE WHEN tjsj = '2012-12-31' THEN 1    ELSE NULL   END) 总人数_2012,
count (CASE WHEN tjsj = '2012-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2012_男,
count (CASE WHEN tjsj = '2012-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2012_女,

avg   (CASE WHEN tjsj = '2013-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2013,
count (CASE WHEN tjsj = '2013-12-31' THEN 1    ELSE NULL   END) 总人数_2013,
count (CASE WHEN tjsj = '2013-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2013_男,
count (CASE WHEN tjsj = '2013-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2013_女,

avg   (CASE WHEN tjsj = '2014-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2014,
count (CASE WHEN tjsj = '2014-12-31' THEN 1    ELSE NULL   END) 总人数_2014,
count (CASE WHEN tjsj = '2014-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2014_男,
count (CASE WHEN tjsj = '2014-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2014_女,

avg   (CASE WHEN tjsj = '2015-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2015,
count (CASE WHEN tjsj = '2015-12-31' THEN 1    ELSE NULL   END) 总人数_2015,
count (CASE WHEN tjsj = '2015-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2015_男,
count (CASE WHEN tjsj = '2015-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2015_女,

avg   (CASE WHEN tjsj = '2016-12-31' THEN  nl  ELSE NULL   END) 平均年龄_2016,
count (CASE WHEN tjsj = '2016-12-31' THEN 1    ELSE NULL   END) 总人数_2016,
count (CASE WHEN tjsj = '2016-12-31' and xb='男' THEN 1    ELSE NULL   END) 总人数_2016_男,
count (CASE WHEN tjsj = '2016-12-31' and xb='女' THEN 1    ELSE NULL   END) 总人数_2016_女


from gsygxx
group by gsdm 
order by gsdm;

5.5 Expansion

According to the above analysis and source code, it is not difficult for us to expand it a little, and we can count

1. The number of male employees in a certain age group (eg 30-40 years old) per company per year

2. Average age of male and female employees per company per year

3. The ratio of male and female employees in each company per year

4. The number of employees added by each company each year (not counted in the first year)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324849822&siteId=291194637