版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/BabyFish13/article/details/82992419
1、原始文件情况
1.1、文件数
[hadoop@emr-worker-10 result]$ ll room54000-htm_2018-*
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 18:59 room54000-htm_2018-09-30-00.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 18:59 room54000-htm_2018-09-30-01.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:00 room54000-htm_2018-09-30-02.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 18:59 room54000-htm_2018-09-30-03.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:00 room54000-htm_2018-09-30-04.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:02 room54000-htm_2018-09-30-05.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:02 room54000-htm_2018-09-30-06.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:03 room54000-htm_2018-09-30-07.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:03 room54000-htm_2018-09-30-08.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:05 room54000-htm_2018-09-30-09.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:06 room54000-htm_2018-09-30-10.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:06 room54000-htm_2018-09-30-11.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:06 room54000-htm_2018-09-30-12.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:07 room54000-htm_2018-09-30-13.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:09 room54000-htm_2018-09-30-14.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:09 room54000-htm_2018-09-30-15.txt
-rw-rw-r-- 1 hadoop hadoop 2 Oct 9 19:10 room54000-htm_2018-09-30-16.txt
1.2、文件内容
[hadoop@emr-worker-10 result]$ cat room54000-htm_2018-*
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
0
0
0
0
0
2、单列数值统计
cat room54000-htm_2018-*|grep -v 0|awk '{sum +=$1};END {print sum}'
cat room54000-htm_2018-*|awk '{sum +=$1};END {print sum}'
[hadoop@emr-worker-10 result]$ cat room54000-htm_2018-*|awk '{sum +=$1};END {print sum}'
516
[hadoop@emr-worker-10 result]$ cat room54000-htm_2018-*|grep -v 0|awk '{sum +=$1};END {print sum}'
516
[hadoop@emr-worker-10 result]$
3、说明
适用于数据跑批后,将每一份的汇总结果再汇总的情况。