Hadoop study notes seven: hadoop combined with Mongodb

mongodb is a very popular non-relational database in the NoSQl field. It provides powerful fragmented storage and query functions. It is more suitable for historical data (log) storage and query. It also provides mapreduce functions, but it is not At any time, users of Mongodb will use the sharding function, and it is more likely to use the replica set method (sometimes there are not many machines), and Hadoop provides HDFS and distributed computing functions. We can use Hadoop's MapReduce to Replacing Mongodb's MapReduce and using Mongodb's replica set to replace Hadoop's HDFS, then there is the connector (adapter) mongo-hadoop-master project between Hadoop and Mongodb (currently available for download in github classes)

 

      One: Download address: https://github.com/mongodb/mongo-hadoop

      Two: Unzip after downloading:

       

js code   Favorite code
  1. [root@bigdata2 software]# cd mongo-hadoop-master  
  2. [root@bigdata2 mongo-hadoop-master]# ll  
  3. total 140  
  4. drwxr-xr-x 3 root root  4096 Oct 15 11:53 bin  
  5. -rw-r--r-- 1 root root  5848 Oct 15 11:53 BSON_README.md  
  6. drwxr-xr-x 4 root root  4096 Nov 30 13:06 build  
  7. -rwxr-xr-x 1 root root   168 Oct 15 11:53 build-all.sh  
  8. -rw-r--r-- 1 root root 12731 Oct 15 11:53 build.gradle  
  9. drwxr-xr-x 2 root root  4096 Oct 15 11:53 clusterConfigs  
  10. drwxr-xr-x 2 root root  4096 Oct 15 11:53 config  
  11. -rw-r--r-- 1 root root  7458 Oct 15 11:53 CONFIG.md  
  12. drwxr-xr-x 4 root root  4096 Nov 30 13:06 core  
  13. drwxr-xr-x 6 root root  4096 Oct 15 11:53 docs  
  14. drwxr-xr-x 7 root root  4096 Oct 15 11:53 examples  
  15. drwxr-xr-x 3 root root  4096 Oct 15 11:53 flume  
  16. drwxr-xr-x 3 root root  4096 Oct 15 11:53 gradle  
  17. -rwxr-xr-x 1 root root  5080 Oct 15 11:53 gradlew  
  18. -rw-r--r-- 1 root root  2314 Oct 15 11:53 gradlew.bat  
  19. -rw-r--r-- 1 root root  1862 Oct 15 11:53 History.md  
  20. drwxr-xr-x 3 root root  4096 Oct 15 11:53 hive  
  21. drwxr-xr-x 3 root root  4096 Oct 15 11:53 integration-tests  
  22. -rw-r--r-- 1 root root  6764 Oct 15 11:53 mongo-defaults.xml  
  23. -rw------- 1 root root  4843 Nov 30 13:12 nohup.out  
  24. drwxr-xr-x 3 root root  4096 Oct 15 11:53 pig  
  25. -rw-r--r-- 1 root root  5106 Oct 15 11:53 README.md  
  26. -rw-r--r-- 1 root root   137 Oct 15 11:53 settings.gradle  
  27. drwxr-xr-x 5 root root  4096 Oct 15 11:53 streaming  
  28. -rwxr-xr-x 1 root root   682 Oct 15 11:53 test.sh  
  29. drwxr-xr-x 2 root root  4096 Oct 15 11:53 tools  
  30. [root@bigdata2 mongo-hadoop-master]#   

 

 

    The Example directory is the built-in test case, I will use the json data below src/main/resources/ in this case mongo-hadoop-master/examples/treasury_yield

   

Some data write
{ "_id" : { "$date" : 631238400000 }, "dayOfWeek" : "TUESDAY", "bc3Year" : 7.9, "bc5Year" : 7.87, "bc10Year" : 7.94, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.87, "bc3Month" : 7.83, "bc30Year" : 8, "bc1Year" : 7.81, "bc7Year" : 7.98, "bc6Month" : 7.89 }
{ "_id" : { "$date" : 631324800000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 7.96, "bc5Year" : 7.92, "bc10Year" : 7.99, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.94, "bc3Month" : 7.89, "bc30Year" : 8.039999999999999, "bc1Year" : 7.85, "bc7Year" : 8.039999999999999, "bc6Month" : 7.94 }
{ "_id" : { "$date" : 631411200000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 7.93, "bc5Year" : 7.91, "bc10Year" : 7.98, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.92, "bc3Month" : 7.84, "bc30Year" : 8.039999999999999, "bc1Year" : 7.82, "bc7Year" : 8.02, "bc6Month" : 7.9 }
{ "_id" : { "$date" : 631497600000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 7.94, "bc5Year" : 7.92, "bc10Year" : 7.99, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.9, "bc3Month" : 7.79, "bc30Year" : 8.06, "bc1Year" : 7.79, "bc7Year" : 8.029999999999999, "bc6Month" : 7.85 }
{ "_id" : { "$date" : 631756800000 }, "dayOfWeek" : "MONDAY", "bc3Year" : 7.95, "bc5Year" : 7.92, "bc10Year" : 8.02, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.9, "bc3Month" : 7.79, "bc30Year" : 8.09, "bc1Year" : 7.81, "bc7Year" : 8.050000000000001, "bc6Month" : 7.88 }
{ "_id" : { "$date" : 631843200000 }, "dayOfWeek" : "TUESDAY", "bc3Year" : 7.94, "bc5Year" : 7.92, "bc10Year" : 8.02, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.91, "bc3Month" : 7.8, "bc30Year" : 8.1, "bc1Year" : 7.78, "bc7Year" : 8.050000000000001, "bc6Month" : 7.82 }
{ "_id" : { "$date" : 631929600000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 7.95, "bc5Year" : 7.92, "bc10Year" : 8.029999999999999, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.91, "bc3Month" : 7.75, "bc30Year" : 8.109999999999999, "bc1Year" : 7.77, "bc7Year" : 8, "bc6Month" : 7.78 }
{ "_id" : { "$date" : 632016000000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 7.95, "bc5Year" : 7.94, "bc10Year" : 8.039999999999999, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.91, "bc3Month" : 7.8, "bc30Year" : 8.109999999999999, "bc1Year" : 7.77, "bc7Year" : 8.01, "bc6Month" : 7.8 }
{ "_id" : { "$date" : 632102400000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 7.98, "bc5Year" : 7.99, "bc10Year" : 8.1, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.93, "bc3Month" : 7.74, "bc30Year" : 8.17, "bc1Year" : 7.76, "bc7Year" : 8.07, "bc6Month" : 7.81 }
{ "_id" : { "$date" : 632448000000 }, "dayOfWeek" : "TUESDAY", "bc3Year" : 8.130000000000001, "bc5Year" : 8.109999999999999, "bc10Year" : 8.199999999999999, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.1, "bc3Month" : 7.89, "bc30Year" : 8.25, "bc1Year" : 7.92, "bc7Year" : 8.18, "bc6Month" : 7.99 }
{ "_id" : { "$date" : 632534400000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 8.109999999999999, "bc5Year" : 8.109999999999999, "bc10Year" : 8.19, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.09, "bc3Month" : 7.97, "bc30Year" : 8.25, "bc1Year" : 7.91, "bc7Year" : 8.17, "bc6Month" : 7.97 }
{ "_id" : { "$date" : 632620800000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 8.279999999999999, "bc5Year" : 8.27, "bc10Year" : 8.32, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.25, "bc3Month" : 8.039999999999999, "bc30Year" : 8.35, "bc1Year" : 8.050000000000001, "bc7Year" : 8.31, "bc6Month" : 8.08 }
{ "_id" : { "$date" : 632707200000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 8.23, "bc5Year" : 8.199999999999999, "bc10Year" : 8.26, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.199999999999999, "bc3Month" : 8, "bc30Year" : 8.289999999999999, "bc1Year" : 8, "bc7Year" : 8.24, "bc6Month" : 8.01 }
{ "_id" : { "$date" : 632966400000 }, "dayOfWeek" : "MONDAY", "bc3Year" : 8.199999999999999, "bc5Year" : 8.19, "bc10Year" : 8.27, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.18, "bc3Month" : 7.99, "bc30Year" : 8.31, "bc1Year" : 7.98, "bc7Year" : 8.25, "bc6Month" : 7.99 }
{ "_id" : { "$date" : 633052800000 }, "dayOfWeek" : "TUESDAY", "bc3Year" : 8.199999999999999, "bc5Year" : 8.18, "bc10Year" : 8.26, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.18, "bc3Month" : 7.93, "bc30Year" : 8.289999999999999, "bc1Year" : 7.97, "bc7Year" : 8.23, "bc6Month" : 7.97 }
{ "_id" : { "$date" : 633139200000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 8.289999999999999, "bc5Year" : 8.279999999999999, "bc10Year" : 8.380000000000001, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.199999999999999, "bc3Month" : 7.93, "bc30Year" : 8.41, "bc1Year" : 8, "bc7Year" : 8.34, "bc6Month" : 7.99 }
{ "_id" : { "$date" : 633225600000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 8.32, "bc5Year" : 8.31, "bc10Year" : 8.42, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.24, "bc3Month" : 7.95, "bc30Year" : 8.460000000000001, "bc1Year" : 8.029999999999999, "bc7Year" : 8.390000000000001, "bc6Month" : 8.01 }
{ "_id" : { "$date" : 633312000000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 8.380000000000001, "bc5Year" : 8.380000000000001, "bc10Year" : 8.49, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.279999999999999, "bc3Month" : 7.93, "bc30Year" : 8.550000000000001, "bc1Year" : 8.07, "bc7Year" : 8.449999999999999, "bc6Month" : 8.039999999999999 }
{ "_id" : { "$date" : 633571200000 }, "dayOfWeek" : "MONDAY", "bc3Year" : 8.390000000000001, "bc5Year" : 8.390000000000001, "bc10Year" : 8.5, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.300000000000001, "bc3Month" : 8, "bc30Year" : 8.539999999999999, "bc1Year" : 8.08, "bc7Year" : 8.449999999999999, "bc6Month" : 8.09 }
{ "_id" : { "$date" : 633657600000 }, "dayOfWeek" : "TUESDAY", "bc3Year" : 8.390000000000001, "bc5Year" : 8.43, "bc10Year" : 8.51, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.300000000000001, "bc3Month" : 8, "bc30Year" : 8.550000000000001, "bc1Year" : 8.09, "bc7Year" : 8.470000000000001, "bc6Month" : 8.140000000000001 }
{ "_id" : { "$date" : 633744000000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 8.359999999999999, "bc5Year" : 8.35, "bc10Year" : 8.43, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.279999999999999, "bc3Month" : 8, "bc30Year" : 8.460000000000001, "bc1Year" : 8.08, "bc7Year" : 8.390000000000001, "bc6Month" : 8.130000000000001 }
{ "_id" : { "$date" : 633830400000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 8.35, "bc5Year" : 8.35, "bc10Year" : 8.42, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.279999999999999, "bc3Month" : 8.02, "bc30Year" : 8.44, "bc1Year" : 8.09, "bc7Year" : 8.380000000000001, "bc6Month" : 8.130000000000001 }
{ "_id" : { "$date" : 633916800000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 8.43, "bc5Year" : 8.42, "bc10Year" : 8.5, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.369999999999999, "bc3Month" : 8.07, "bc30Year" : 8.51, "bc1Year" : 8.130000000000001, "bc7Year" : 8.460000000000001, "bc6Month" : 8.17 }
{ "_id" : { "$date" : 634176000000 }, "dayOfWeek" : "MONDAY", "bc3Year" : 8.43, "bc5Year" : 8.44, "bc10Year" : 8.529999999999999, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.369999999999999, "bc3Month" : 8.08, "bc30Year" : 8.529999999999999, "bc1Year" : 8.15, "bc7Year" : 8.48, "bc6Month" : 8.18 }
{ "_id" : { "$date" : 634262400000 }, "dayOfWeek" : "TUESDAY", "bc3Year" : 8.43, "bc5Year" : 8.49, "bc10Year" : 8.57, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.42, "bc3Month" : 8.09, "bc30Year" : 8.58, "bc1Year" : 8.15, "bc7Year" : 8.52, "bc6Month" : 8.17 }
{ "_id" : { "$date" : 634348800000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 8.43, "bc5Year" : 8.51, "bc10Year" : 8.52, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.42, "bc3Month" : 8.08, "bc30Year" : 8.57, "bc1Year" : 8.17, "bc7Year" : 8.529999999999999, "bc6Month" : 8.19 }
{ "_id" : { "$date" : 634435200000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 8.390000000000001, "bc5Year" : 8.449999999999999, "bc10Year" : 8.49, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.369999999999999, "bc3Month" : 8.08, "bc30Year" : 8.5, "bc1Year" : 8.130000000000001, "bc7Year" : 8.48, "bc6Month" : 8.18 }
{ "_id" : { "$date" : 634521600000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 8.24, "bc5Year" : 8.289999999999999, "bc10Year" : 8.31, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.25, "bc3Month" : 8.02, "bc30Year" : 8.359999999999999, "bc1Year" : 8.029999999999999, "bc7Year" : 8.34, "bc6Month" : 8.09 }

 

   

  三: 我们查看他的README.md,可以看出 ,需要编译

   

Js代码   Favorite code
  1. ## Building  
  2.   
  3. The mongo-hadoop connector currently supports the following versions of hadoop:  0.23, 1.0, 1.1, 2.2, 2.3, 2.4,   
  4. and CDH 4 abd 5.  The default build version will build against the last Apache Hadoop (currently 2.4).  If you would like to build   
  5. against a specific version of Hadoop you simply need to pass `-PclusterVersion=<your version>` to gradlew when building.  
  6.   
  7. Run `./gradlew jar` to build the jars.  The jars will be placed in to `build/libs` for each module.  e.g. for the core module,   
  8. it will be generated in the `core/build/libs` directory.  
  9.   
  10. After successfully building, you must copy the jars to the lib directory on each node in your hadoop cluster. This is usually one of the  
  11. following locations, depending on which Hadoop release you are using:  
  12.   
  13. * `$HADOOP_HOME/lib/`  
  14. * `$HADOOP_HOME/share/hadoop/mapreduce/`  
  15. * `$HADOOP_HOME/share/hadoop/lib/`  
  16.  
  17. ## Supported Distributions of Hadoop  
  18.   
  19. | Hadoop Version                       | Build Parameter         |  
  20. | :----------------------------------: | :---------------------: |  
  21. | Apache Hadoop 0.23                   | -PclusterVersion='0.23' |  
  22. | Apache Hadoop 1.0                    | -PclusterVersion='1.0'  |  
  23. | Apache Hadoop 1.1                    | -PclusterVersion='1.1'  |  
  24. | Apache Hadoop 2.2                    | -PclusterVersion='2.2'  |  
  25. | Apache Hadoop 2.3                    | -PclusterVersion='2.3'  |  
  26. | Apache Hadoop 2.4                    | -PclusterVersion='2.4'  |  
  27. --More--(49%)  

    我们按照下面指令编译:

 

  

Js代码   Favorite code
  1. ./gradlew jar  

 

 

   编译过程比较缓慢,下载一个较大的软件是amazon的s3,有250多M,完成以后,会在core/build/libs目录下生成Jar包 mongo-hadoop-core-1.4.0-SNAPSHOT.jar(最大的战斗成果。。) ,我们带上JAVA连接MongoDb的驱动,一起拷贝到$hadoop_home/lib里面 ,当然也可以采用运行时加载的方法

   

Java代码   Favorite code
  1. DistributedCache.addFileToClassPath(new Path("/root/software/mongo-java-driver-2.11.1.jar"), conf);  
  2. DistributedCache.addFileToClassPath(new Path("/root/software/mongo-hadoop-core-1.4.0-SNAPSHOT.jar"), conf);  

 

 

    有了编译好的驱动,我们就可以用它来连接Mongodb了。

       四:首先我们准备数据,把刚才的数据导入到mongodb

   

Js代码   Favorite code
  1. mongoimport --host 127.0.0.1 --port 27017 -d testmr -c example --file ./yield_historical_in.json  

 

 

      查看数据:

    

写道
> show collections
example
mongotest
system.indexes
> db.example.find().limit(2);
{ "_id" : ISODate("1990-01-02T00:00:00Z"), "dayOfWeek" : "TUESDAY", "bc3Year" :
7.9, "bc5Year" : 7.87, "bc10Year" : 7.94, "bc20Year" : null, "bc1Month" : null,
"bc2Year" : 7.87, "bc3Month" : 7.83, "bc30Year" : 8, "bc1Year" : 7.81, "bc7Year"
: 7.98, "bc6Month" : 7.89 }
{ "_id" : ISODate("1990-01-03T00:00:00Z"), "dayOfWeek" : "WEDNESDAY", "bc3Year"
: 7.96, "bc5Year" : 7.92, "bc10Year" : 7.99, "bc20Year" : null, "bc1Month" : nul
l, "bc2Year" : 7.94, "bc3Month" : 7.89, "bc30Year" : 8.04, "bc1Year" : 7.85, "bc
7Year" : 8.04, "bc6Month" : 7.94 }
>

     五:新建一个MapReduce工程

   

Java代码   Favorite code
  1. import java.io.IOException;  
  2. import java.util.Date;  
  3.   
  4. import org.apache.hadoop.io.DoubleWritable;  
  5. import org.apache.hadoop.io.IntWritable;  
  6. import org.apache.hadoop.mapreduce.Mapper;  
  7. import org.bson.BSONObject;  
  8.   
  9. public class MongoTestMapper extends Mapper<Object,BSONObject, IntWritable, DoubleWritable> {  
  10.   
  11.                 @Override  
  12.                 public void map(final Object pkey, final BSONObject pvalue,final Context context)  
  13.                 {  
  14.                         final int year = ((Date)pvalue.get("_id")).getYear()+1990;  
  15.                         double bdyear  = ((Number)pvalue.get("bc10Year")).doubleValue();  
  16.                         try {  
  17.                                 context.write( new IntWritable( year ), new DoubleWritable( bdyear ));  
  18.                         } catch (IOException e) {  
  19.                                 // TODO Auto-generated catch block  
  20.                                 e.printStackTrace();  
  21.                         } catch (InterruptedException e) {  
  22.                                 // TODO Auto-generated catch block  
  23.                                 e.printStackTrace();  
  24.                         }  
  25.                 }  
  26. }  

  

Java代码   Favorite code
  1. public class MongoTestReducer extends Reducer<IntWritable,DoubleWritable,IntWritable,BSONWritable>  
  2. {  
  3.         public void reduce( final IntWritable pKey,  
  4.             final Iterable<DoubleWritable> pValues,  
  5.             final Context pContext ) throws IOException, InterruptedException{  
  6.           int count = 0;  
  7.       double sum = 0.0;  
  8.       for ( final DoubleWritable value : pValues ){  
  9.           sum += value.get();  
  10.           count++;  
  11.       }  
  12.   
  13.       final double avg = sum / count;  
  14.   
  15.                 BasicBSONObject out = new BasicBSONObject();  
  16.                 out.put("avg", avg);  
  17.                 pContext.write(pKey, new BSONWritable(out));  
  18.         }  
  19. }  

 

 

这是一个计算平均值的例子的部分代码,之后在Hadoop环境上运行,可以看到输出到Mongodb的结果

 

 

写道
> db.mongotest.find();
{ "_id" : 2080, "avg" : 8.552400000000002 }
{ "_id" : 2081, "avg" : 7.8623600000000025 }
{ "_id" : 2082, "avg" : 7.008844621513946 }
{ "_id" : 2083, "avg" : 5.866279999999999 }
{ "_id" : 2084, "avg" : 7.085180722891565 }
{ "_id" : 2085, "avg" : 6.573920000000002 }
{ "_id" : 2086, "avg" : 6.443531746031742 }
{ "_id" : 2087, "avg" : 6.353959999999992 }
{ "_id" : 2088, "avg" : 5.262879999999994 }
{ "_id" : 2089, "avg" : 5.646135458167332 }
{ "_id" : 2090, "avg" : 6.030278884462145 }
{ "_id" : 2091, "avg" : 5.02068548387097 }
{ "_id" : 2092, "avg" : 4.61308 }
{ "_id" : 2093, "avg" : 4.013879999999999 }
{ "_id" : 2094, "avg" : 4.271320000000004 }
{ "_id" : 2095, "avg" : 4.288880000000001 }
{ "_id" : 2096, "avg" : 4.7949999999999955 }
{ "_id" : 2097, "avg" : 4.634661354581674 }
{ "_id" : 2098, "avg" : 3.6642629482071714 }
{ "_id" : 2099, "avg" : 3.2641200000000037 }
Type "it" for more

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326436886&siteId=291194637