Pig
42.在master节点安装Pig Clients,打开Linux Shell以MapReduce模式启动它的Grunt,将启动命令和启动结果显示如下。
简写:
[root@master ~]# pig
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : LOCAL
17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE
17/05/07 07:58:29 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType
2017-05-07 07:58:29,081 [main] INFO org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52
2017-05-07 07:58:29,081 [main] INFO org.apache.pig.Main - Logging error messagesto: /root/pig_1494143909080.log
2017-05-07 07:58:29,104 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found
2017-05-07 07:58:29,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020
2017-05-07 07:58:30,427 [main] INFO org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-a45e5d92-ef27-4629-8326-66cbf6605e8e
2017-05-07 07:58:30,870 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 07:58:30,977 [main] INFO org.apache.pig.backend.hadoop.ATSService -Created ATS Hook
grunt>
原形:
[root@master ~]# pig -x mapreduce
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/07 08:00:45 INFO pig.ExecTypeProvider: TryingExecType : LOCAL
17/05/07 08:00:46 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE
17/05/07 08:00:46 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType
2017-05-07 08:00:46,060 [main] INFO org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52
2017-05-07 08:00:46,060 [main] INFO org.apache.pig.Main - Logging error messagesto: /root/pig_1494144046058.log
2017-05-07 08:00:46,086 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found
2017-05-07 08:00:46,540 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020
2017-05-07 08:00:47,459 [main] INFO org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-8abb37de-5d81-487e-99f0-d1fb8eceac03
2017-05-07 08:00:47,913 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:00:48,019 [main] INFO org.apache.pig.backend.hadoop.ATSService -Created ATS Hook
grunt>
43.在master节点安装Pig Clients,打开Linux Shell以Local模式启动它的Grunt,将启动命令和启动结果显示如下。
[root@master ~]# pig -x local
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/07 08:00:21 INFO pig.ExecTypeProvider: TryingExecType : LOCAL
17/05/07 08:00:21 INFO pig.ExecTypeProvider: PickedLOCAL as the ExecType
2017-05-07 08:00:21,794 [main] INFO org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52
2017-05-07 08:00:21,794 [main] INFO org.apache.pig.Main - Logging error messagesto: /root/pig_1494144021792.log
2017-05-07 08:00:21,814 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found
2017-05-07 08:00:21,974 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine- Connecting to hadoop file system at: file:///
2017-05-07 08:00:22,194 [main] INFO org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-edcec26c-6956-4050-97c9-1c1806d1c853
2017-05-07 08:00:22,622 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:00:23,221 [main] INFO org.apache.pig.backend.hadoop.ATSService -Created ATS Hook
grunt>
44.使用Pig工具在Local模式计算系统日志access_log.txt中的IP的点击数,要求使用GROUP BY语句按照IP进行分组,通过FOREACH运算符,对关系的列进行迭代,统计每个分组的总行数,最后使用DUMP 语句查询统计结果。将查询命令和查询结果显示如下。
[root@master ~]# pig
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/07 08:03:59 INFO pig.ExecTypeProvider: TryingExecType : LOCAL
17/05/07 08:03:59 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE
17/05/07 08:03:59 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType
2017-05-07 08:03:59,198 [main] INFO org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52
2017-05-07 08:03:59,198 [main] INFO org.apache.pig.Main - Logging error messagesto: /root/pig_1494144239196.log
2017-05-07 08:03:59,220 [main] INFO org.apache.pig.impl.util.Utils - Default bootupfile /root/.pigbootup not found
2017-05-07 08:03:59,618 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020
2017-05-07 08:04:00,528 [main] INFO org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-bdf6e873-56a4-479d-b13f-3244ce895852
2017-05-07 08:04:00,991 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:04:01,098 [main] INFO org.apache.pig.backend.hadoop.ATSService -Created ATS Hook
grunt> copyFromLocal /root/access_log.txt/user/root/input/log1.txt
grunt> A =LOAD '/user/root/input/log1.txt' USINGPigStorage (' ') AS (ip,others);
grunt> group_ip =group A by ip;
grunt> result =foreach group_ip generategroup,COUNT(A);
grunt> dump result;
2017-05-07 08:09:16,681 [main] INFO org.apache.pig.tools.pigstats.ScriptState -Pig features used in the script: GROUP_BY
2017-05-07 08:09:16,718 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.
2017-05-07 08:09:16,756 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter,MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer,PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-05-07 08:09:16,896 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler- File concatenation threshold: 100 optimistic? false
2017-05-07 08:09:16,917 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil- Choosing to move algebraic foreach to combiner
2017-05-07 08:09:16,949 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 1
2017-05-07 08:09:16,949 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 1
2017-05-07 08:09:17,135 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:09:17,143 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:09:17,432 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig scriptsettings are added to the job
2017-05-07 08:09:17,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-05-07 08:09:17,440 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Reduce phase detected, estimating # of required reducers.
2017-05-07 08:09:17,441 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Using reducer estimator:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2017-05-07 08:09:17,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=3045
2017-05-07 08:09:17,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting Parallelism to 1
2017-05-07 08:09:17,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- This job cannot be converted run in-process
2017-05-07 08:09:17,962 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/pig-0.15.0.2.4.3.0-227-core-h2.jar toDistributedCache through/tmp/temp-2081003050/tmp-1335603865/pig-0.15.0.2.4.3.0-227-core-h2.jar
2017-05-07 08:09:18,350 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/automaton-1.11-8.jar toDistributedCache through/tmp/temp-2081003050/tmp1196315125/automaton-1.11-8.jar
2017-05-07 08:09:18,646 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/antlr-runtime-3.4.jar toDistributedCache through/tmp/temp-2081003050/tmp1640187430/antlr-runtime-3.4.jar
2017-05-07 08:09:18,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/joda-time-2.9.4.jar toDistributedCache through /tmp/temp-2081003050/tmp247039283/joda-time-2.9.4.jar
2017-05-07 08:09:18,962 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job
2017-05-07 08:09:18,968 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key[pig.schematuple] is false, will not generate code.
2017-05-07 08:09:18,968 [main] INFO org.apache.pig.data.SchemaTupleFrontend -Starting process to move generated code to distributed cacche
2017-05-07 08:09:18,968 [main] INFO org.apache.pig.data.SchemaTupleFrontend -Setting key [pig.schematuple.classes] with classes to deserialize []
2017-05-07 08:09:19,046 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map-reduce job(s) waiting for submission.
2017-05-07 08:09:19,138 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:09:19,138 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:09:19,540 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader- No job jar file set. User classes maynot be found. See Job or Job#setJar(String).
2017-05-07 08:09:19,600 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1
2017-05-07 08:09:19,600 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1
2017-05-07 08:09:19,619 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths (combined) to process : 1
2017-05-07 08:09:20,198 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter -number of splits:1
2017-05-07 08:09:20,553 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter -Submitting tokens for job: job_1494143770260_0002
2017-05-07 08:09:20,687 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jaris not present. Not adding any jar to the list of resources.
2017-05-07 08:09:21,386 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submittedapplication application_1494143770260_0002
2017-05-07 08:09:21,446 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url totrack the job: http://slaver1:8088/proxy/application_1494143770260_0002/
2017-05-07 08:09:21,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- HadoopJobId: job_1494143770260_0002
2017-05-07 08:09:21,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Processing aliases A,group_ip,result
2017-05-07 08:09:21,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- detailed locations: M: A[1,3],result[3,8],group_ip[2,10] C:result[3,8],group_ip[2,10] R: result[3,8]
2017-05-07 08:09:21,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete
2017-05-07 08:09:21,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0002]
2017-05-07 08:10:28,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 50% complete
2017-05-07 08:10:28,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0002]
2017-05-07 08:10:38,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0002]
2017-05-07 08:10:42,044 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:10:42,045 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:10:42,056 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:10:43,592 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:10:43,593 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:10:43,600 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server
2017-05-07 08:10:43,744 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:10:43,745 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:10:43,750 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server
2017-05-07 08:10:43,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete
2017-05-07 08:10:43,820 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - ScriptStatistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.1.2.4.3.0-227 0.15.0.2.4.3.0-227 root 2017-05-07 08:09:17 2017-05-07 08:10:43 GROUP_BY
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1494143770260_0002 1 1 3 3 3 3 7 7 7 7 A,group_ip,result GROUP_BY,COMBINER hdfs://master:8020/tmp/temp-2081003050/tmp270373100,
Input(s):
Successfully read 10 records (3415 bytes) from:"/user/root/input/log1.txt"
Output(s):
Successfully stored 4 records (85 bytes) in:"hdfs://master:8020/tmp/temp-2081003050/tmp270373100"
Counters:
Total records written : 4
Total bytes written : 85
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1494143770260_0002
2017-05-07 08:10:43,893 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:10:43,893 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:10:43,898 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:10:44,022 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:10:44,022 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:10:44,027 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:10:44,130 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:10:44,131 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:10:44,137 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:10:44,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Success!
2017-05-07 08:10:44,179 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.
2017-05-07 08:10:44,190 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1
2017-05-07 08:10:44,190 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1
(112.97.24.243,7)
(208.115.113.82,1)
(220.181.94.221,1)
(220.181.108.151,1)
45.使用Pig工具计算天气数据集temperature.txt中年度最高气温,要求使用GROUP BY语句按照year进行分组,通过FOREACH运算符,对关系的列进行迭代,统计每个分组的最大值,最后使用DUMP 语句查询计算结果。将以上查询命令和查询结果显示如下。
[root@master ~]# pig
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : LOCAL
17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE
17/05/07 07:58:29 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType
2017-05-07 07:58:29,081 [main] INFO org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52
2017-05-07 07:58:29,081 [main] INFO org.apache.pig.Main - Logging error messagesto: /root/pig_1494143909080.log
2017-05-07 07:58:29,104 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found
2017-05-07 07:58:29,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020
2017-05-07 07:58:30,427 [main] INFO org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-a45e5d92-ef27-4629-8326-66cbf6605e8e
2017-05-07 07:58:30,870 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 07:58:30,977 [main] INFO org.apache.pig.backend.hadoop.ATSService -Created ATS Hook
grunt> copyFromLocal /root/temp.txt/user/root/temp.txt
grunt> A = LOAD '/user/root/temp.txt' USINGPigStorage(' ')AS (year:int,temperature:int);
grunt> B = GROUP A BY year;
grunt> C = FOREACH B GENERATEgroup,MAX(A.temperature);
grunt> dump C;
2017-05-07 08:23:04,298 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pigfeatures used in the script: GROUP_BY
2017-05-07 08:23:04,326 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.
2017-05-07 08:23:04,326 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer- {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter,MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer,PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-05-07 08:23:04,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler- File concatenation threshold: 100 optimistic? false
2017-05-07 08:23:04,337 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil- Choosing to move algebraic foreach to combiner
2017-05-07 08:23:04,339 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 1
2017-05-07 08:23:04,339 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 1
2017-05-07 08:23:04,425 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:23:04,425 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:23:04,432 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState- Pig script settings are added to the job
2017-05-07 08:23:04,433 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-05-07 08:23:04,433 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Reduce phase detected, estimating # of required reducers.
2017-05-07 08:23:04,433 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Using reducer estimator:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2017-05-07 08:23:04,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=55
2017-05-07 08:23:04,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting Parallelism to 1
2017-05-07 08:23:04,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- This job cannot be converted run in-process
2017-05-07 08:23:04,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/pig-0.15.0.2.4.3.0-227-core-h2.jar toDistributedCache through/tmp/temp-2081003050/tmp455800525/pig-0.15.0.2.4.3.0-227-core-h2.jar
2017-05-07 08:23:04,491 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/automaton-1.11-8.jar toDistributedCache through/tmp/temp-2081003050/tmp1148421651/automaton-1.11-8.jar
2017-05-07 08:23:04,505 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/antlr-runtime-3.4.jar toDistributedCache through/tmp/temp-2081003050/tmp280400162/antlr-runtime-3.4.jar
2017-05-07 08:23:04,930 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/joda-time-2.9.4.jar toDistributedCache through /tmp/temp-2081003050/tmp93917590/joda-time-2.9.4.jar
2017-05-07 08:23:04,949 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job
2017-05-07 08:23:04,950 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key[pig.schematuple] is false, will not generate code.
2017-05-07 08:23:04,950 [main] INFO org.apache.pig.data.SchemaTupleFrontend -Starting process to move generated code to distributed cacche
2017-05-07 08:23:04,950 [main] INFO org.apache.pig.data.SchemaTupleFrontend -Setting key [pig.schematuple.classes] with classes to deserialize []
2017-05-07 08:23:04,977 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map-reduce job(s) waiting for submission.
2017-05-07 08:23:05,050 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:23:05,050 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:23:05,077 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar fileset. User classes may not be found. SeeJob or Job#setJar(String).
2017-05-07 08:23:05,114 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat- Total input paths to process : 1
2017-05-07 08:23:05,114 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1
2017-05-07 08:23:05,116 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths (combined) to process : 1
2017-05-07 08:23:05,155 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter -number of splits:1
2017-05-07 08:23:05,193 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter -Submitting tokens for job: job_1494143770260_0003
2017-05-07 08:23:05,196 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jaris not present. Not adding any jar to the list of resources.
2017-05-07 08:23:05,444 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submittedapplication application_1494143770260_0003
2017-05-07 08:23:05,449 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url totrack the job: http://slaver1:8088/proxy/application_1494143770260_0003/
2017-05-07 08:23:05,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- HadoopJobId: job_1494143770260_0003
2017-05-07 08:23:05,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Processing aliases A,B,C
2017-05-07 08:23:05,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- detailed locations: M: A[4,4],A[-1,-1],C[6,4],B[5,4] C: C[6,4],B[5,4] R:C[6,4]
2017-05-07 08:23:05,488 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete
2017-05-07 08:23:05,488 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0003]
2017-05-07 08:27:59,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 50% complete
2017-05-07 08:27:59,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0003]
2017-05-07 08:30:37,027 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0003]
2017-05-07 08:30:43,127 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:30:43,128 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connectingto ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:30:43,191 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:30:44,506 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:30:44,506 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:30:44,520 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:30:44,663 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:30:44,664 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:30:44,669 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:30:44,721 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete
2017-05-07 08:30:44,721 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - ScriptStatistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.1.2.4.3.0-227 0.15.0.2.4.3.0-227 root 2017-05-07 08:23:04 2017-05-07 08:30:44 GROUP_BY
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime AliasFeature Outputs
job_1494143770260_0003 1 1 284 284 284 284 149 149 149 149 A,B,C GROUP_BY,COMBINER hdfs://master:8020/tmp/temp-2081003050/tmp-111966269,
Input(s):
Successfully read 7 records (419 bytes) from:"/user/root/temp.txt"
Output(s):
Successfully stored 2 records (18 bytes) in:"hdfs://master:8020/tmp/temp-2081003050/tmp-111966269"
Counters:
Total records written : 2
Total bytes written : 18
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1494143770260_0003
2017-05-07 08:30:44,784 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:30:44,784 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:30:44,789 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server
2017-05-07 08:30:44,892 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline serviceaddress: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:30:44,892 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:30:44,897 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server
2017-05-07 08:30:44,990 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:30:44,991 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:30:44,996 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:30:45,029 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Success!
2017-05-07 08:30:45,031 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.
2017-05-07 08:30:45,034 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1
2017-05-07 08:30:45,034 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil- Total input paths to process : 1
(1990,34)
(1991,35)
46.使用Pig工具统计数据集ip_to_country中每个国家的IP地址数。要求使用GROUP BY语句按照国家进行分组,通过FOREACH运算符,对关系的列进行迭代,统计每个分组的IP地址数目,最后将统计结果保存到/data/pig/output目录中,并查看数据结果。将以上操作命令和查询结果显示如下。
[root@master ~]# pig
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : LOCAL
17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE
17/05/07 07:58:29 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType
2017-05-07 07:58:29,081 [main] INFO org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52
2017-05-07 07:58:29,081 [main] INFO org.apache.pig.Main - Logging error messagesto: /root/pig_1494143909080.log
2017-05-07 07:58:29,104 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found
2017-05-07 07:58:29,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine- Connecting to hadoop file system at: hdfs://master:8020
2017-05-07 07:58:30,427 [main] INFO org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-a45e5d92-ef27-4629-8326-66cbf6605e8e
2017-05-07 07:58:30,870 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 07:58:30,977 [main] INFO org.apache.pig.backend.hadoop.ATSService -Created ATS Hook
grunt> copyFromLocal /root/ip_to_country.txt/user/root/ip_to_country.txt
grunt> ip_countries = LOAD'/user/root/ip_to_country.txt' AS (ip: chararray, country:chararray);
grunt> country_grpd = GROUP ip_countries BYcountry;
grunt> country_counts = FOREACH country_grpdGENERATE FLATTEN(group),COUNT(ip_countries) as counts;
grunt> STORE country_counts INTO'/data/pig/output';
2017-05-07 08:36:28,897 [main] INFO org.apache.pig.tools.pigstats.ScriptState -Pig features used in the script: GROUP_BY
2017-05-07 08:36:28,921 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.
2017-05-07 08:36:28,922 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter,MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer,PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2017-05-07 08:36:28,925 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler- File concatenation threshold: 100 optimistic? false
2017-05-07 08:36:28,926 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil- Choosing to move algebraic foreach to combiner
2017-05-07 08:36:28,927 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 1
2017-05-07 08:36:28,927 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 1
2017-05-07 08:36:29,009 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:29,009 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:29,013 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig scriptsettings are added to the job
2017-05-07 08:36:29,014 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-05-07 08:36:29,014 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Reduce phase detected, estimating # of required reducers.
2017-05-07 08:36:29,014 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Using reducer estimator:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2017-05-07 08:36:29,015 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=75728
2017-05-07 08:36:29,015 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting Parallelism to 1
2017-05-07 08:36:29,015 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- This job cannot be converted run in-process
2017-05-07 08:36:29,057 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/pig-0.15.0.2.4.3.0-227-core-h2.jar toDistributedCache through/tmp/temp-2081003050/tmp1402100776/pig-0.15.0.2.4.3.0-227-core-h2.jar
2017-05-07 08:36:29,118 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/automaton-1.11-8.jar toDistributedCache through/tmp/temp-2081003050/tmp1729569612/automaton-1.11-8.jar
2017-05-07 08:36:29,143 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/antlr-runtime-3.4.jar toDistributedCache through/tmp/temp-2081003050/tmp658892872/antlr-runtime-3.4.jar
2017-05-07 08:36:29,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/joda-time-2.9.4.jar toDistributedCache through /tmp/temp-2081003050/tmp68205245/joda-time-2.9.4.jar
2017-05-07 08:36:29,210 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job
2017-05-07 08:36:29,211 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key[pig.schematuple] is false, will not generate code.
2017-05-07 08:36:29,211 [main] INFO org.apache.pig.data.SchemaTupleFrontend -Starting process to move generated code to distributed cacche
2017-05-07 08:36:29,211 [main] INFO org.apache.pig.data.SchemaTupleFrontend -Setting key [pig.schematuple.classes] with classes to deserialize []
2017-05-07 08:36:29,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map-reduce job(s) waiting for submission.
2017-05-07 08:36:29,300 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:29,301 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:29,354 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader- No job jar file set. User classes maynot be found. See Job or Job#setJar(String).
2017-05-07 08:36:29,387 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1
2017-05-07 08:36:29,387 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1
2017-05-07 08:36:29,389 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total inputpaths (combined) to process : 1
2017-05-07 08:36:29,443 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter -number of splits:1
2017-05-07 08:36:29,491 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter -Submitting tokens for job: job_1494145664092_0001
2017-05-07 08:36:29,493 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jaris not present. Not adding any jar to the list of resources.
2017-05-07 08:36:29,769 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl- Submitted application application_1494145664092_0001
2017-05-07 08:36:29,774 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url totrack the job: http://slaver1:8088/proxy/application_1494145664092_0001/
2017-05-07 08:36:29,774 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- HadoopJobId: job_1494145664092_0001
2017-05-07 08:36:29,774 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Processing aliases country_counts,country_grpd,ip_countries
2017-05-07 08:36:29,774 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- detailed locations: M:ip_countries[7,15],ip_countries[-1,-1],country_counts[9,17],country_grpd[8,15]C: country_counts[9,17],country_grpd[8,15] R: country_counts[9,17]
2017-05-07 08:36:29,781 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete
2017-05-07 08:36:29,781 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494145664092_0001]
2017-05-07 08:36:41,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 50% complete
2017-05-07 08:36:41,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494145664092_0001]
2017-05-07 08:36:49,387 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494145664092_0001]
2017-05-07 08:36:49,969 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:49,969 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:49,976 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:36:50,266 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:50,266 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connectingto ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:50,272 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:36:50,377 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:50,377 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:50,382 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:36:50,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete
2017-05-07 08:36:50,417 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - ScriptStatistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.1.2.4.3.0-227 0.15.0.2.4.3.0-227 root 2017-05-07 08:36:29 2017-05-07 08:36:50 GROUP_BY
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1494145664092_0001 1 1 3 3 3 3 3 3 3 3 country_counts,country_grpd,ip_countries GROUP_BY,COMBINER /data/pig/output,
Input(s):
Successfully read 3000 records (76101 bytes) from:"/user/root/ip_to_country.txt"
Output(s):
Successfully stored 98 records (1207 bytes) in:"/data/pig/output"
Counters:
Total records written : 98
Total bytes written : 1207
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1494145664092_0001
2017-05-07 08:36:50,485 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:50,485 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:50,490 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:36:50,589 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:50,590 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:50,595 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-05-07 08:36:50,694 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/
2017-05-07 08:36:50,694 [main] INFO org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050
2017-05-07 08:36:50,699 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server
2017-05-07 08:36:50,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Success!
grunt> cat /data/pig/output/part-r-00000
Iraq 1
Oman 1
Peru 3
Chile 7
China 252
Egypt 6
Gabon 1
India 30
Italy 43
Japan 177
Macau 1
Nepal 1
Qatar 1
Spain 21
Yemen 2
Angola 2
Brazil 38
Canada 75
Europe 34
France 58
Greece 6
Israel 6
Kuwait 5
Latvia 1
Mexico 23
Norway 18
Poland 15
Serbia 1
Sweden 17
Taiwan 26
Turkey 16
Albania 1
Algeria 2
Austria 14
Bahrain 1
Belarus 1
Belgium 14
Croatia 2
Denmark 11
Ecuador 3
Estonia 2
Finland 13
Germany 89
Hungary 2
Iceland 1
Ireland 5
Morocco 19
Nigeria 1
Romania 13
Senegal 1
Tunisia 3
Ukraine 10
Uruguay 2
Vietnam 13
Barbados 1
Botswana 1
Bulgaria 6
Colombia 21
Malaysia 8
Pakistan 4
Portugal 3
Slovenia 2
Thailand 10
Argentina 13
Australia 68
Guatemala 1
Hong Kong 8
Indonesia 29
Lithuania 6
Macedonia 1
Mauritius 10
Singapore 5
Venezuela 4
Azerbaijan 1
Costa Rica 2
Kazakhstan 3
Martinique 1
Uzbekistan 1
Netherlands 28
New Zealand 9
Philippines 7
Switzerland 15
Saudi Arabia 4
South Africa 20
United States 1379
Czech Republic 7
United Kingdom 93
Anonymous Proxy 1
Dominican Republic 1
Korea, Republic of 70
Russian Federation 36
Satellite Provider 2
Moldova, Republic of 1
Syrian Arab Republic 1
United Arab Emirates 2
Bosnia and Herzegovina 1
Iran, Islamic Republic of 2
Tanzania, United Republic of 1