大数据平台运维之Pig

Pig

42.在master节点安装Pig Clients,打开Linux Shell以MapReduce模式启动它的Grunt,将启动命令和启动结果显示如下。

简写:

[root@master ~]# pig

WARNING: Use "yarn jar" to launch YARNapplications.

17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : LOCAL

17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE

17/05/07 07:58:29 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType

2017-05-07 07:58:29,081 [main] INFO  org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52

2017-05-07 07:58:29,081 [main] INFO  org.apache.pig.Main - Logging error messagesto: /root/pig_1494143909080.log

2017-05-07 07:58:29,104 [main] INFO  org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found

2017-05-07 07:58:29,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020

2017-05-07 07:58:30,427 [main] INFO  org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-a45e5d92-ef27-4629-8326-66cbf6605e8e

2017-05-07 07:58:30,870 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 07:58:30,977 [main] INFO  org.apache.pig.backend.hadoop.ATSService -Created ATS Hook

grunt>

 

原形:

[root@master ~]# pig -x mapreduce

WARNING: Use "yarn jar" to launch YARNapplications.

17/05/07 08:00:45 INFO pig.ExecTypeProvider: TryingExecType : LOCAL

17/05/07 08:00:46 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE

17/05/07 08:00:46 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType

2017-05-07 08:00:46,060 [main] INFO  org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52

2017-05-07 08:00:46,060 [main] INFO  org.apache.pig.Main - Logging error messagesto: /root/pig_1494144046058.log

2017-05-07 08:00:46,086 [main] INFO  org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found

2017-05-07 08:00:46,540 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020

2017-05-07 08:00:47,459 [main] INFO  org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-8abb37de-5d81-487e-99f0-d1fb8eceac03

2017-05-07 08:00:47,913 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:00:48,019 [main] INFO  org.apache.pig.backend.hadoop.ATSService -Created ATS Hook

grunt>

 

43.在master节点安装Pig Clients,打开Linux Shell以Local模式启动它的Grunt,将启动命令和启动结果显示如下。

[root@master ~]# pig -x local

WARNING: Use "yarn jar" to launch YARNapplications.

17/05/07 08:00:21 INFO pig.ExecTypeProvider: TryingExecType : LOCAL

17/05/07 08:00:21 INFO pig.ExecTypeProvider: PickedLOCAL as the ExecType

2017-05-07 08:00:21,794 [main] INFO  org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52

2017-05-07 08:00:21,794 [main] INFO  org.apache.pig.Main - Logging error messagesto: /root/pig_1494144021792.log

2017-05-07 08:00:21,814 [main] INFO  org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found

2017-05-07 08:00:21,974 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine- Connecting to hadoop file system at: file:///

2017-05-07 08:00:22,194 [main] INFO  org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-edcec26c-6956-4050-97c9-1c1806d1c853

2017-05-07 08:00:22,622 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:00:23,221 [main] INFO  org.apache.pig.backend.hadoop.ATSService -Created ATS Hook

grunt>

 

44.使用Pig工具在Local模式计算系统日志access_log.txt中的IP的点击数,要求使用GROUP BY语句按照IP进行分组,通过FOREACH运算符,对关系的列进行迭代,统计每个分组的总行数,最后使用DUMP 语句查询统计结果。将查询命令和查询结果显示如下。

[root@master ~]# pig

WARNING: Use "yarn jar" to launch YARNapplications.

17/05/07 08:03:59 INFO pig.ExecTypeProvider: TryingExecType : LOCAL

17/05/07 08:03:59 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE

17/05/07 08:03:59 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType

2017-05-07 08:03:59,198 [main] INFO  org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52

2017-05-07 08:03:59,198 [main] INFO  org.apache.pig.Main - Logging error messagesto: /root/pig_1494144239196.log

2017-05-07 08:03:59,220 [main] INFO  org.apache.pig.impl.util.Utils - Default bootupfile /root/.pigbootup not found

2017-05-07 08:03:59,618 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020

2017-05-07 08:04:00,528 [main] INFO  org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-bdf6e873-56a4-479d-b13f-3244ce895852

2017-05-07 08:04:00,991 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:04:01,098 [main] INFO  org.apache.pig.backend.hadoop.ATSService -Created ATS Hook

grunt> copyFromLocal /root/access_log.txt/user/root/input/log1.txt

grunt> A =LOAD '/user/root/input/log1.txt' USINGPigStorage (' ') AS (ip,others);

grunt> group_ip =group A by ip;

grunt> result =foreach group_ip generategroup,COUNT(A);

grunt> dump result;

2017-05-07 08:09:16,681 [main] INFO  org.apache.pig.tools.pigstats.ScriptState -Pig features used in the script: GROUP_BY

2017-05-07 08:09:16,718 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.

2017-05-07 08:09:16,756 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter,MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer,PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}

2017-05-07 08:09:16,896 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler- File concatenation threshold: 100 optimistic? false

2017-05-07 08:09:16,917 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil- Choosing to move algebraic foreach to combiner

2017-05-07 08:09:16,949 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 1

2017-05-07 08:09:16,949 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 1

2017-05-07 08:09:17,135 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:09:17,143 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:09:17,432 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig scriptsettings are added to the job

2017-05-07 08:09:17,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2017-05-07 08:09:17,440 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Reduce phase detected, estimating # of required reducers.

2017-05-07 08:09:17,441 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Using reducer estimator:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator

2017-05-07 08:09:17,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=3045

2017-05-07 08:09:17,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting Parallelism to 1

2017-05-07 08:09:17,449 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- This job cannot be converted run in-process

2017-05-07 08:09:17,962 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/pig-0.15.0.2.4.3.0-227-core-h2.jar toDistributedCache through/tmp/temp-2081003050/tmp-1335603865/pig-0.15.0.2.4.3.0-227-core-h2.jar

2017-05-07 08:09:18,350 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/automaton-1.11-8.jar toDistributedCache through/tmp/temp-2081003050/tmp1196315125/automaton-1.11-8.jar

2017-05-07 08:09:18,646 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/antlr-runtime-3.4.jar toDistributedCache through/tmp/temp-2081003050/tmp1640187430/antlr-runtime-3.4.jar

2017-05-07 08:09:18,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/joda-time-2.9.4.jar toDistributedCache through /tmp/temp-2081003050/tmp247039283/joda-time-2.9.4.jar

2017-05-07 08:09:18,962 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job

2017-05-07 08:09:18,968 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key[pig.schematuple] is false, will not generate code.

2017-05-07 08:09:18,968 [main] INFO  org.apache.pig.data.SchemaTupleFrontend -Starting process to move generated code to distributed cacche

2017-05-07 08:09:18,968 [main] INFO  org.apache.pig.data.SchemaTupleFrontend -Setting key [pig.schematuple.classes] with classes to deserialize []

2017-05-07 08:09:19,046 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map-reduce job(s) waiting for submission.

2017-05-07 08:09:19,138 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:09:19,138 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:09:19,540 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader- No job jar file set.  User classes maynot be found. See Job or Job#setJar(String).

2017-05-07 08:09:19,600 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1

2017-05-07 08:09:19,600 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1

2017-05-07 08:09:19,619 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths (combined) to process : 1

2017-05-07 08:09:20,198 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter -number of splits:1

2017-05-07 08:09:20,553 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter -Submitting tokens for job: job_1494143770260_0002

2017-05-07 08:09:20,687 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jaris not present. Not adding any jar to the list of resources.

2017-05-07 08:09:21,386 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submittedapplication application_1494143770260_0002

2017-05-07 08:09:21,446 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url totrack the job: http://slaver1:8088/proxy/application_1494143770260_0002/

2017-05-07 08:09:21,447 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- HadoopJobId: job_1494143770260_0002

2017-05-07 08:09:21,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Processing aliases A,group_ip,result

2017-05-07 08:09:21,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- detailed locations: M: A[1,3],result[3,8],group_ip[2,10] C:result[3,8],group_ip[2,10] R: result[3,8]

2017-05-07 08:09:21,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete

2017-05-07 08:09:21,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0002]

2017-05-07 08:10:28,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 50% complete

2017-05-07 08:10:28,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0002]

2017-05-07 08:10:38,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0002]

2017-05-07 08:10:42,044 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:10:42,045 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:10:42,056 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:10:43,592 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:10:43,593 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:10:43,600 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server

2017-05-07 08:10:43,744 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:10:43,745 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:10:43,750 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server

2017-05-07 08:10:43,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete

2017-05-07 08:10:43,820 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - ScriptStatistics:

 

HadoopVersion  PigVersion      UserId  StartedAt       FinishedAt      Features

2.7.1.2.4.3.0-227      0.15.0.2.4.3.0-227      root    2017-05-07 08:09:17     2017-05-07 08:10:43     GROUP_BY

 

Success!

 

Job Stats (time in seconds):

JobId  Maps    Reduces MaxMapTime      MinMapTime      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias Feature  Outputs

job_1494143770260_0002 1       1       3      3       3       3      7       7       7      7       A,group_ip,result       GROUP_BY,COMBINER      hdfs://master:8020/tmp/temp-2081003050/tmp270373100,

 

Input(s):

Successfully read 10 records (3415 bytes) from:"/user/root/input/log1.txt"

 

Output(s):

Successfully stored 4 records (85 bytes) in:"hdfs://master:8020/tmp/temp-2081003050/tmp270373100"

 

Counters:

Total records written : 4

Total bytes written : 85

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

 

Job DAG:

job_1494143770260_0002

 

 

2017-05-07 08:10:43,893 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:10:43,893 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:10:43,898 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:10:44,022 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:10:44,022 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:10:44,027 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:10:44,130 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:10:44,131 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:10:44,137 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:10:44,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Success!

2017-05-07 08:10:44,179 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.

2017-05-07 08:10:44,190 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1

2017-05-07 08:10:44,190 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1

(112.97.24.243,7)

(208.115.113.82,1)

(220.181.94.221,1)

(220.181.108.151,1)

 

 

45.使用Pig工具计算天气数据集temperature.txt中年度最高气温,要求使用GROUP BY语句按照year进行分组,通过FOREACH运算符,对关系的列进行迭代,统计每个分组的最大值,最后使用DUMP 语句查询计算结果。将以上查询命令和查询结果显示如下。

[root@master ~]# pig

WARNING: Use "yarn jar" to launch YARNapplications.

17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : LOCAL

17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE

17/05/07 07:58:29 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType

2017-05-07 07:58:29,081 [main] INFO  org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52

2017-05-07 07:58:29,081 [main] INFO  org.apache.pig.Main - Logging error messagesto: /root/pig_1494143909080.log

2017-05-07 07:58:29,104 [main] INFO  org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found

2017-05-07 07:58:29,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: hdfs://master:8020

2017-05-07 07:58:30,427 [main] INFO  org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-a45e5d92-ef27-4629-8326-66cbf6605e8e

2017-05-07 07:58:30,870 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 07:58:30,977 [main] INFO  org.apache.pig.backend.hadoop.ATSService -Created ATS Hook

 

grunt> copyFromLocal /root/temp.txt/user/root/temp.txt

grunt> A = LOAD '/user/root/temp.txt' USINGPigStorage(' ')AS (year:int,temperature:int);

grunt> B = GROUP A BY year;

grunt> C = FOREACH B GENERATEgroup,MAX(A.temperature);

grunt> dump C;

2017-05-07 08:23:04,298 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pigfeatures used in the script: GROUP_BY

2017-05-07 08:23:04,326 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.

2017-05-07 08:23:04,326 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer- {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter,MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer,PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}

2017-05-07 08:23:04,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler- File concatenation threshold: 100 optimistic? false

2017-05-07 08:23:04,337 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil- Choosing to move algebraic foreach to combiner

2017-05-07 08:23:04,339 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 1

2017-05-07 08:23:04,339 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 1

2017-05-07 08:23:04,425 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:23:04,425 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:23:04,432 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState- Pig script settings are added to the job

2017-05-07 08:23:04,433 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2017-05-07 08:23:04,433 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Reduce phase detected, estimating # of required reducers.

2017-05-07 08:23:04,433 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Using reducer estimator:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator

2017-05-07 08:23:04,435 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=55

2017-05-07 08:23:04,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting Parallelism to 1

2017-05-07 08:23:04,435 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- This job cannot be converted run in-process

2017-05-07 08:23:04,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/pig-0.15.0.2.4.3.0-227-core-h2.jar toDistributedCache through/tmp/temp-2081003050/tmp455800525/pig-0.15.0.2.4.3.0-227-core-h2.jar

2017-05-07 08:23:04,491 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/automaton-1.11-8.jar toDistributedCache through/tmp/temp-2081003050/tmp1148421651/automaton-1.11-8.jar

2017-05-07 08:23:04,505 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/antlr-runtime-3.4.jar toDistributedCache through/tmp/temp-2081003050/tmp280400162/antlr-runtime-3.4.jar

2017-05-07 08:23:04,930 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/joda-time-2.9.4.jar toDistributedCache through /tmp/temp-2081003050/tmp93917590/joda-time-2.9.4.jar

2017-05-07 08:23:04,949 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job

2017-05-07 08:23:04,950 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key[pig.schematuple] is false, will not generate code.

2017-05-07 08:23:04,950 [main] INFO  org.apache.pig.data.SchemaTupleFrontend -Starting process to move generated code to distributed cacche

2017-05-07 08:23:04,950 [main] INFO  org.apache.pig.data.SchemaTupleFrontend -Setting key [pig.schematuple.classes] with classes to deserialize []

2017-05-07 08:23:04,977 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map-reduce job(s) waiting for submission.

2017-05-07 08:23:05,050 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:23:05,050 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:23:05,077 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar fileset.  User classes may not be found. SeeJob or Job#setJar(String).

2017-05-07 08:23:05,114 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat- Total input paths to process : 1

2017-05-07 08:23:05,114 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1

2017-05-07 08:23:05,116 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths (combined) to process : 1

2017-05-07 08:23:05,155 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter -number of splits:1

2017-05-07 08:23:05,193 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter -Submitting tokens for job: job_1494143770260_0003

2017-05-07 08:23:05,196 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jaris not present. Not adding any jar to the list of resources.

2017-05-07 08:23:05,444 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submittedapplication application_1494143770260_0003

2017-05-07 08:23:05,449 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url totrack the job: http://slaver1:8088/proxy/application_1494143770260_0003/

2017-05-07 08:23:05,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- HadoopJobId: job_1494143770260_0003

2017-05-07 08:23:05,479 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Processing aliases A,B,C

2017-05-07 08:23:05,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- detailed locations: M: A[4,4],A[-1,-1],C[6,4],B[5,4] C: C[6,4],B[5,4] R:C[6,4]

2017-05-07 08:23:05,488 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete

2017-05-07 08:23:05,488 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0003]

2017-05-07 08:27:59,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 50% complete

2017-05-07 08:27:59,719 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0003]

2017-05-07 08:30:37,027 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494143770260_0003]

2017-05-07 08:30:43,127 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:30:43,128 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connectingto ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:30:43,191 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:30:44,506 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:30:44,506 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:30:44,520 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:30:44,663 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:30:44,664 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:30:44,669 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:30:44,721 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete

2017-05-07 08:30:44,721 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - ScriptStatistics:

 

HadoopVersion  PigVersion      UserId  StartedAt       FinishedAt      Features

2.7.1.2.4.3.0-227      0.15.0.2.4.3.0-227      root    2017-05-07 08:23:04     2017-05-07 08:30:44     GROUP_BY

 

Success!

 

Job Stats (time in seconds):

JobId  Maps    Reduces MaxMapTime      MinMapTime      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime  AvgReduceTime  MedianReducetime        AliasFeature  Outputs

job_1494143770260_0003 1       1       284    284     284     284    149     149     149    149     A,B,C   GROUP_BY,COMBINER       hdfs://master:8020/tmp/temp-2081003050/tmp-111966269,

 

Input(s):

Successfully read 7 records (419 bytes) from:"/user/root/temp.txt"

 

Output(s):

Successfully stored 2 records (18 bytes) in:"hdfs://master:8020/tmp/temp-2081003050/tmp-111966269"

 

Counters:

Total records written : 2

Total bytes written : 18

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

 

Job DAG:

job_1494143770260_0003

 

 

2017-05-07 08:30:44,784 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:30:44,784 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:30:44,789 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server

2017-05-07 08:30:44,892 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline serviceaddress: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:30:44,892 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:30:44,897 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server

2017-05-07 08:30:44,990 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:30:44,991 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:30:44,996 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:30:45,029 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Success!

2017-05-07 08:30:45,031 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.

2017-05-07 08:30:45,034 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1

2017-05-07 08:30:45,034 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil- Total input paths to process : 1

(1990,34)

(1991,35)

 

46.使用Pig工具统计数据集ip_to_country中每个国家的IP地址数。要求使用GROUP BY语句按照国家进行分组,通过FOREACH运算符,对关系的列进行迭代,统计每个分组的IP地址数目,最后将统计结果保存到/data/pig/output目录中,并查看数据结果。将以上操作命令和查询结果显示如下。

[root@master ~]# pig

WARNING: Use "yarn jar" to launch YARNapplications.

17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : LOCAL

17/05/07 07:58:29 INFO pig.ExecTypeProvider: TryingExecType : MAPREDUCE

17/05/07 07:58:29 INFO pig.ExecTypeProvider: PickedMAPREDUCE as the ExecType

2017-05-07 07:58:29,081 [main] INFO  org.apache.pig.Main - Apache Pig version0.15.0.2.4.3.0-227 (rexported) compiled Sep 10 2016, 00:14:52

2017-05-07 07:58:29,081 [main] INFO  org.apache.pig.Main - Logging error messagesto: /root/pig_1494143909080.log

2017-05-07 07:58:29,104 [main] INFO  org.apache.pig.impl.util.Utils - Defaultbootup file /root/.pigbootup not found

2017-05-07 07:58:29,507 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine- Connecting to hadoop file system at: hdfs://master:8020

2017-05-07 07:58:30,427 [main] INFO  org.apache.pig.PigServer - Pig Script ID forthe session: PIG-default-a45e5d92-ef27-4629-8326-66cbf6605e8e

2017-05-07 07:58:30,870 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 07:58:30,977 [main] INFO  org.apache.pig.backend.hadoop.ATSService -Created ATS Hook

 

grunt> copyFromLocal /root/ip_to_country.txt/user/root/ip_to_country.txt

grunt> ip_countries = LOAD'/user/root/ip_to_country.txt' AS (ip: chararray, country:chararray);

grunt> country_grpd = GROUP ip_countries BYcountry;

grunt> country_counts = FOREACH country_grpdGENERATE FLATTEN(group),COUNT(ip_countries) as counts;

grunt> STORE country_counts INTO'/data/pig/output';

2017-05-07 08:36:28,897 [main] INFO  org.apache.pig.tools.pigstats.ScriptState -Pig features used in the script: GROUP_BY

2017-05-07 08:36:28,921 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key[pig.schematuple] was not set... will not generate code.

2017-05-07 08:36:28,922 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter,MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer,PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}

2017-05-07 08:36:28,925 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler- File concatenation threshold: 100 optimistic? false

2017-05-07 08:36:28,926 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil- Choosing to move algebraic foreach to combiner

2017-05-07 08:36:28,927 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 1

2017-05-07 08:36:28,927 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 1

2017-05-07 08:36:29,009 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:29,009 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:29,013 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig scriptsettings are added to the job

2017-05-07 08:36:29,014 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2017-05-07 08:36:29,014 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Reduce phase detected, estimating # of required reducers.

2017-05-07 08:36:29,014 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Using reducer estimator:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator

2017-05-07 08:36:29,015 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=75728

2017-05-07 08:36:29,015 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting Parallelism to 1

2017-05-07 08:36:29,015 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- This job cannot be converted run in-process

2017-05-07 08:36:29,057 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/pig-0.15.0.2.4.3.0-227-core-h2.jar toDistributedCache through/tmp/temp-2081003050/tmp1402100776/pig-0.15.0.2.4.3.0-227-core-h2.jar

2017-05-07 08:36:29,118 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/automaton-1.11-8.jar toDistributedCache through/tmp/temp-2081003050/tmp1729569612/automaton-1.11-8.jar

2017-05-07 08:36:29,143 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/pig/lib/antlr-runtime-3.4.jar toDistributedCache through/tmp/temp-2081003050/tmp658892872/antlr-runtime-3.4.jar

2017-05-07 08:36:29,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Added jar file:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/joda-time-2.9.4.jar toDistributedCache through /tmp/temp-2081003050/tmp68205245/joda-time-2.9.4.jar

2017-05-07 08:36:29,210 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job

2017-05-07 08:36:29,211 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key[pig.schematuple] is false, will not generate code.

2017-05-07 08:36:29,211 [main] INFO  org.apache.pig.data.SchemaTupleFrontend -Starting process to move generated code to distributed cacche

2017-05-07 08:36:29,211 [main] INFO  org.apache.pig.data.SchemaTupleFrontend -Setting key [pig.schematuple.classes] with classes to deserialize []

2017-05-07 08:36:29,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map-reduce job(s) waiting for submission.

2017-05-07 08:36:29,300 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:29,301 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:29,354 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader- No job jar file set.  User classes maynot be found. See Job or Job#setJar(String).

2017-05-07 08:36:29,387 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total inputpaths to process : 1

2017-05-07 08:36:29,387 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Totalinput paths to process : 1

2017-05-07 08:36:29,389 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total inputpaths (combined) to process : 1

2017-05-07 08:36:29,443 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter -number of splits:1

2017-05-07 08:36:29,491 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter -Submitting tokens for job: job_1494145664092_0001

2017-05-07 08:36:29,493 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jaris not present. Not adding any jar to the list of resources.

2017-05-07 08:36:29,769 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl- Submitted application application_1494145664092_0001

2017-05-07 08:36:29,774 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url totrack the job: http://slaver1:8088/proxy/application_1494145664092_0001/

2017-05-07 08:36:29,774 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- HadoopJobId: job_1494145664092_0001

2017-05-07 08:36:29,774 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Processing aliases country_counts,country_grpd,ip_countries

2017-05-07 08:36:29,774 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- detailed locations: M:ip_countries[7,15],ip_countries[-1,-1],country_counts[9,17],country_grpd[8,15]C: country_counts[9,17],country_grpd[8,15] R: country_counts[9,17]

2017-05-07 08:36:29,781 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete

2017-05-07 08:36:29,781 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494145664092_0001]

2017-05-07 08:36:41,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 50% complete

2017-05-07 08:36:41,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494145664092_0001]

2017-05-07 08:36:49,387 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Running jobs are [job_1494145664092_0001]

2017-05-07 08:36:49,969 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:49,969 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:49,976 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:36:50,266 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:50,266 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connectingto ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:50,272 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:36:50,377 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:50,377 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:50,382 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:36:50,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete

2017-05-07 08:36:50,417 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - ScriptStatistics:

 

HadoopVersion  PigVersion      UserId  StartedAt      FinishedAt      Features

2.7.1.2.4.3.0-227      0.15.0.2.4.3.0-227      root    2017-05-07 08:36:29     2017-05-07 08:36:50     GROUP_BY

 

Success!

 

Job Stats (time in seconds):

JobId  Maps    Reduces MaxMapTime      MinMapTime      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias Feature  Outputs

job_1494145664092_0001 1       1       3      3       3       3      3       3       3      3      country_counts,country_grpd,ip_countries        GROUP_BY,COMBINER      /data/pig/output,

 

Input(s):

Successfully read 3000 records (76101 bytes) from:"/user/root/ip_to_country.txt"

 

Output(s):

Successfully stored 98 records (1207 bytes) in:"/data/pig/output"

 

Counters:

Total records written : 98

Total bytes written : 1207

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

 

Job DAG:

job_1494145664092_0001

 

 

2017-05-07 08:36:50,485 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:50,485 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:50,490 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:36:50,589 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl- Timeline service address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:50,590 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:50,595 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state iscompleted. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

2017-05-07 08:36:50,694 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timelineservice address: http://slaver1:8188/ws/v1/timeline/

2017-05-07 08:36:50,694 [main] INFO  org.apache.hadoop.yarn.client.RMProxy -Connecting to ResourceManager at slaver1/10.0.0.15:8050

2017-05-07 08:36:50,699 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate- Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server

2017-05-07 08:36:50,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Success!

grunt> cat /data/pig/output/part-r-00000

Iraq    1

Oman    1

Peru    3

Chile   7

China   252

Egypt   6

Gabon   1

India   30

Italy   43

Japan   177

Macau   1

Nepal   1

Qatar   1

Spain   21

Yemen   2

Angola  2

Brazil  38

Canada  75

Europe  34

France  58

Greece  6

Israel  6

Kuwait  5

Latvia  1

Mexico  23

Norway  18

Poland  15

Serbia  1

Sweden  17

Taiwan  26

Turkey  16

Albania 1

Algeria 2

Austria 14

Bahrain 1

Belarus 1

Belgium 14

Croatia 2

Denmark 11

Ecuador 3

Estonia 2

Finland 13

Germany 89

Hungary 2

Iceland 1

Ireland 5

Morocco 19

Nigeria 1

Romania 13

Senegal 1

Tunisia 3

Ukraine 10

Uruguay 2

Vietnam 13

Barbados       1

Botswana       1

Bulgaria       6

Colombia       21

Malaysia       8

Pakistan       4

Portugal       3

Slovenia       2

Thailand       10

Argentina      13

Australia      68

Guatemala      1

Hong Kong      8

Indonesia      29

Lithuania      6

Macedonia      1

Mauritius      10

Singapore      5

Venezuela      4

Azerbaijan     1

Costa Rica     2

Kazakhstan     3

Martinique     1

Uzbekistan     1

Netherlands    28

New Zealand    9

Philippines    7

Switzerland    15

Saudi Arabia   4

South Africa   20

United States  1379

Czech Republic 7

United Kingdom 93

Anonymous Proxy 1

Dominican Republic     1

Korea, Republic of     70

Russian Federation     36

Satellite Provider     2

Moldova, Republic of   1

Syrian Arab Republic   1

United Arab Emirates   2

Bosnia and Herzegovina 1

Iran, Islamic Republic of       2

Tanzania, United Republic of    1

猜你喜欢

转载自blog.csdn.net/kamroselee/article/details/80279448