1 Introduction
I encountered a problem yesterday 统计
. The data is about 14w
100%. In the Excel
file, I originally planned to use the python I learned before to practice my hands. Later, when the time was tight, I directly imported it into the Oracle database for statistics. The process of solving is very interesting. It is hereby recorded as follows:
2. Demand
Information about employees of more than 3,000 companies from 2008 to 2016; there are
14w
articles of data; stored in Excel files;
Required statistics:
1. The number of employees in
each company per year 2. The number of male and female employees in
each company per year 3. The average age of employees in each company per yearThe sql statement is at the end, if you need it urgently, you can skip
to the end. The
sql statement is at the end, if you need it urgently, you can jump to the end.
3. Preliminary knowledge
3.1 case general statement
- The case statement has a selection effect to return the first statement that meets the requirements, that is, the judgment of the first statement and the second statement is true, and the first one is returned.
- The syntax of the case varies depending on where it is placed.
CASE SELECTOR
WHEN EXPRESSION_1 THEN STATEMENT_1;
[WHEN EXPRESSION_2 THEN STATEMENT_2;]
[...]
[ELSE STATEMENT_N+1 ;]
END CASE;
- Note the need after then; semicolon, and END CASE at the end;
CASE v_element
WHEN xx THEN yy;
WHEN xxx THEN yyy;
ELSE yyyy;
END CASE;
When v_element is equal to xx, execute the yy statement. If it is very long, you can add begin and end before and after. The judgment condition is v_element = xx, and xx is the specific value.
3.2 The search case statement
CASE
WHEN SEARCH_CONDITION_1 THEN STATEMENT_1;
[WHEN SEARCH_CONDITION_1 THEN STATEMENT_2;]
[...]
[ELSE STATEMENT_N+1 ;]
END CASE;
CASE
WHEN v_element=xx THEN yy;
WHEN v_element=xxx THEN yyy;
ELSE yyyy;
END CASE;
4. Problem Analysis
The difficulty of solving the problem lies in:
每个公司每年的
4.1 Normal "twice grouping"
The more intuitive solution is: first according to the
公司
grouping, and then according to the年
grouping and adding constraints at the same time to count
The results of the statistics are as follows:
4.2 "Twice Grouping of Curves"
If a company's information is required to be counted as one piece of data
At this time, it can only be grouped according to the company first. After grouping, the case statement "curve to achieve grouping" effect is used in the group.
5. Problem solving
5.1 Environment introduction
1.Oracle 11g 64位
2.PLSQL 11.0.3.1700
3.Microsoft Office 2013
5.2 Create table
create table gsygxx
(
gsdm VARCHAR2(16),
tjsj VARCHAR2(64),
xm VARCHAR2(32),
xb VARCHAR2(8),
nl NUMBER
)
5.3 Excel data import
5.4 Normal "twice grouping" source code
First, according to the
公司
grouping, and then according to the employee information年
obtained by the grouping, directly add constraints, and you can count the results每个公司每年
select
gsdm 公司代码,
tjsj 统计时间,
count(*) 总人数,
avg (nl) 平均年龄,
count(CASE WHEN xb='女' THEN 1 ELSE NULL END) 女,
count(CASE WHEN xb='男' THEN 1 ELSE NULL END) 男
from gsygxx
group by gsdm,tjsj
order by gsdm,tjsj;
5.5 "Curve Twice Grouping" Source Code
1. All employee information of all years
公司
is obtained according to the grouping每个公司
: when adding constraints, additional year restrictions need to be added.
2. In order to meet the requirements一个公司的信息统计为一条数据
, the data of different years needs to be counted in one sql statement.
select
gsdm 公司代码,
count(*) 总人数,
avg (CASE WHEN tjsj = '2008-12-31' THEN nl ELSE NULL END) 平均年龄_2008,
count (CASE WHEN tjsj = '2008-12-31' THEN 1 ELSE NULL END) 总人数_2008,
count (CASE WHEN tjsj = '2008-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2008_男,
count (CASE WHEN tjsj = '2008-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2008_女,
avg (CASE WHEN tjsj = '2009-12-31' THEN nl ELSE NULL END) 平均年龄_2009,
count (CASE WHEN tjsj = '2009-12-31' THEN 1 ELSE NULL END) 总人数_2009,
count (CASE WHEN tjsj = '2009-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2009_男,
count (CASE WHEN tjsj = '2009-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2009_女,
avg (CASE WHEN tjsj = '2010-12-31' THEN nl ELSE NULL END) 平均年龄_2010,
count (CASE WHEN tjsj = '2010-12-31' THEN 1 ELSE NULL END) 总人数_2010,
count (CASE WHEN tjsj = '2010-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2010_男,
count (CASE WHEN tjsj = '2010-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2010_女,
avg (CASE WHEN tjsj = '2011-12-31' THEN nl ELSE NULL END) 平均年龄_2011,
count (CASE WHEN tjsj = '2011-12-31' THEN 1 ELSE NULL END) 总人数_2011,
count (CASE WHEN tjsj = '2011-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2011_男,
count (CASE WHEN tjsj = '2011-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2011_女,
avg (CASE WHEN tjsj = '2012-12-31' THEN nl ELSE NULL END) 平均年龄_2012,
count (CASE WHEN tjsj = '2012-12-31' THEN 1 ELSE NULL END) 总人数_2012,
count (CASE WHEN tjsj = '2012-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2012_男,
count (CASE WHEN tjsj = '2012-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2012_女,
avg (CASE WHEN tjsj = '2013-12-31' THEN nl ELSE NULL END) 平均年龄_2013,
count (CASE WHEN tjsj = '2013-12-31' THEN 1 ELSE NULL END) 总人数_2013,
count (CASE WHEN tjsj = '2013-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2013_男,
count (CASE WHEN tjsj = '2013-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2013_女,
avg (CASE WHEN tjsj = '2014-12-31' THEN nl ELSE NULL END) 平均年龄_2014,
count (CASE WHEN tjsj = '2014-12-31' THEN 1 ELSE NULL END) 总人数_2014,
count (CASE WHEN tjsj = '2014-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2014_男,
count (CASE WHEN tjsj = '2014-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2014_女,
avg (CASE WHEN tjsj = '2015-12-31' THEN nl ELSE NULL END) 平均年龄_2015,
count (CASE WHEN tjsj = '2015-12-31' THEN 1 ELSE NULL END) 总人数_2015,
count (CASE WHEN tjsj = '2015-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2015_男,
count (CASE WHEN tjsj = '2015-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2015_女,
avg (CASE WHEN tjsj = '2016-12-31' THEN nl ELSE NULL END) 平均年龄_2016,
count (CASE WHEN tjsj = '2016-12-31' THEN 1 ELSE NULL END) 总人数_2016,
count (CASE WHEN tjsj = '2016-12-31' and xb='男' THEN 1 ELSE NULL END) 总人数_2016_男,
count (CASE WHEN tjsj = '2016-12-31' and xb='女' THEN 1 ELSE NULL END) 总人数_2016_女
from gsygxx
group by gsdm
order by gsdm;
5.5 Expansion
According to the above analysis and source code, it is not difficult for us to expand it a little, and we can count
1. The number of male employees in a certain age group (eg 30-40 years old) per company per year
2. Average age of male and female employees per company per year
3. The ratio of male and female employees in each company per year
4. The number of employees added by each company each year (not counted in the first year)