Introduction to hadoop -- use Apache Pig to count each flight

The case is based on hadoop 2.73, pseudo-distributed cluster


1. Import the data package into the /user/root directory of the hadoop cluster hdfs

hdfs dfs -copyFromLocal 2008.csv /user/root

2. Write the totalmiles.pig script

records = LOAD '2008.csv' USING PigStorage(',') AS
(Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance:int,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay);
milage_recs = GROUP records ALL;
tot_miles = FOREACH milage_recs GENERATE SUM(records.Distance);
STORE tot_miles INTO '/user/root/totalmiles';
  • LOAD: Read a file in hdfs or all files in a directory.

  • USING: By default, Pig parses the file content with tab spaces. You can specify the function to customize Pig parsing with commas.

  • AS xx: hdfs can store any raw data, Pig needs to read data from hdfs and parse it into a data model that Pig understands.

  • GROUP … ALL: Aggregate each type of result set.

  • FOREACH A GENERATE B: The A result set is converted into a single value using the B function.

  • STORE INTO: Store the result to hdfs.

3. Execute the totalmiles.pig script from the command line

pig -x mapreduce totalmiles.pig

Tip: -x+mapreduce/spark/tez, etc., which one to choose depends on the computing framework selected by the cluster.

Operation:

Details at logfile: /usr/test/code/pig_1516001376428.log
2018-01-14 23:29:39,112 [main] INFO  org.apache.pig.Main - Pig script completed in 3 seconds and 128 milliseconds (3128 ms)

4. View the results

hdfs dfs -cat /user/root/totalmiles/part-r-00000

Result situation:

[root@slave1 code]# hdfs dfs -cat /user/root/totalmiles/part-r-00000
5091775499

资料:
1、《Hadoop For Dummies》
2、《Aapache Pig Getting Started》

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325811485&siteId=291194637
pig
pig