El cálculo de granularidad múltiple de Hive optimiza la función de la vista lateral de conversión de columna a fila

 Hoy vi una idea de optimización de colmena de reducir la cantidad de trabajos, de la que vale la pena aprender

 La tabla de prueba y los datos son los siguientes: 

create table sale_order as 
select '河南省' as province,'洛阳市' as city,'汝阳县' as country, 100 as sales
union all 
select '河南省' as province,'洛阳市' as city,'伊川县' as country, 200 as sales
union all 
select '河南省' as province,'洛阳市' as city,'汝阳县' as country, 300 as sales
union all 
select '河南省' as province,'许昌市' as city,'许昌县' as country, 500 as sales 

El monto de las ventas debe calcularse de acuerdo con la granularidad de provincias, ciudades, distritos y condados.

La primera idea convencional, unión todo después de un resumen separado 

--基本写法 4个job 
select '1' as id,province,sum(sales) as sales from sale_order group by province
union all 
select '2' as id,city,sum(sales) as sales from sale_order group by city
union all 
select '3' as id,country,sum(sales) as sales from sale_order group by country  

El registro se ve así

hive> select '1' as id,province,sum(sales) as sales from sale_order group by province
    > union all 
    > select '2' as id,city,sum(sales) as sales from sale_order group by city
    > union all 
    > select '3' as id,country,sum(sales) as sales from sale_order group by country;
Query ID = root_20210319133916_04d285a4-0e45-46dd-9313-aaef32a8a946
Total jobs = 4
Launching Job 1 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1616110053329_0007, Tracking URL = http://hadoop:8088/proxy/application_1616110053329_0007/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1616110053329_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-03-19 13:40:02,368 Stage-1 map = 0%,  reduce = 0%
2021-03-19 13:40:51,683 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 21.16 sec
2021-03-19 13:41:18,388 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 24.51 sec
2021-03-19 13:41:19,676 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 25.79 sec
MapReduce Total cumulative CPU time: 25 seconds 790 msec
Ended Job = job_1616110053329_0007
Launching Job 2 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1616110053329_0008, Tracking URL = http://hadoop:8088/proxy/application_1616110053329_0008/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1616110053329_0008
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2021-03-19 13:41:51,501 Stage-3 map = 0%,  reduce = 0%
2021-03-19 13:42:08,663 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.0 sec
2021-03-19 13:42:23,783 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.46 sec
MapReduce Total cumulative CPU time: 6 seconds 460 msec
Ended Job = job_1616110053329_0008
Launching Job 3 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1616110053329_0009, Tracking URL = http://hadoop:8088/proxy/application_1616110053329_0009/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1616110053329_0009
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 1
2021-03-19 13:42:50,714 Stage-4 map = 0%,  reduce = 0%
2021-03-19 13:43:07,800 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 2.69 sec
2021-03-19 13:43:22,628 Stage-4 map = 100%,  reduce = 100%, Cumulative CPU 5.25 sec
MapReduce Total cumulative CPU time: 5 seconds 250 msec
Ended Job = job_1616110053329_0009
Launching Job 4 out of 4
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1616110053329_0010, Tracking URL = http://hadoop:8088/proxy/application_1616110053329_0010/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1616110053329_0010
Hadoop job information for Stage-2: number of mappers: 3; number of reducers: 0
2021-03-19 13:43:50,093 Stage-2 map = 0%,  reduce = 0%
2021-03-19 13:44:51,565 Stage-2 map = 0%,  reduce = 0%
2021-03-19 13:45:07,140 Stage-2 map = 67%,  reduce = 0%, Cumulative CPU 18.43 sec
2021-03-19 13:45:10,262 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 19.89 sec
MapReduce Total cumulative CPU time: 19 seconds 890 msec
Ended Job = job_1616110053329_0010
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 25.79 sec   HDFS Read: 7128 HDFS Write: 128 SUCCESS
Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 6.46 sec   HDFS Read: 7106 HDFS Write: 160 SUCCESS
Stage-Stage-4: Map: 1  Reduce: 1   Cumulative CPU: 5.25 sec   HDFS Read: 7102 HDFS Write: 191 SUCCESS
Stage-Stage-2: Map: 3   Cumulative CPU: 19.89 sec   HDFS Read: 9650 HDFS Write: 151 SUCCESS
Total MapReduce CPU Time Spent: 57 seconds 390 msec
OK
3	伊川县	200
3	汝阳县	400
3	许昌县	500
2	洛阳市	600
2	许昌市	500
1	河南省	1100
Time taken: 356.641 seconds, Fetched: 6 row(s)

Como puede ver, 4 trabajos tomaron un total de 356.641 segundos

El segundo método es utilizar la vista lateral de la función columna a fila para generar las columnas que deben resumirse en una columna y luego resumir

--写法二  1个job
   select type,code,sum(sales) from (
    select type,code,
		   sales from sale_order
    lateral view explode(str_to_map(concat('1=',province,'&2=',city,'&3=',country
	                     ),'&','=')
						 ) adtable as type,code 
  ) df group by type,code

El registro se ve así: 

hive> select type,code,sum(sales) from (select type,code,sales from sale_order lateral view explode(str_to_map(concat('1=',province,'&2=',city,'&3=',country),'&','=')) adtable as type,code ) df group by type,code;
Query ID = root_20210319134854_767b4eea-86f5-4763-88f8-a197dcbb4fa8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1616110053329_0011, Tracking URL = http://hadoop:8088/proxy/application_1616110053329_0011/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1616110053329_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-03-19 13:49:18,665 Stage-1 map = 0%,  reduce = 0%
2021-03-19 13:49:43,007 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.93 sec
2021-03-19 13:50:04,839 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 9.49 sec
2021-03-19 13:50:07,697 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 11.07 sec
MapReduce Total cumulative CPU time: 11 seconds 70 msec
Ended Job = job_1616110053329_0011
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 11.07 sec   HDFS Read: 11114 HDFS Write: 151 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 70 msec
OK
1	河南省	1100
2	洛阳市	600
2	许昌市	500
3	伊川县	200
3	汝阳县	400
3	许昌县	500
Time taken: 75.882 seconds, Fetched: 6 row(s)

Como puede ver, hay un total de 1 trabajo, lo que toma un total de 75.882 segundos, lo que mejora la eficiencia.

      Si encuentra algún problema, corríjalo.

 

Supongo que te gusta

Origin blog.csdn.net/zhaoxiangchong/article/details/115012963
Recomendado
Clasificación