[Ms. Zhao Qiang] Use MapReduce to calculate aggregation in MongoDB

[Ms. Zhao Qiang] Use MapReduce to calculate aggregation in MongoDB

MapReduce can calculate very complex aggregation logic and is very flexible. However, MapReduce is very slow and should not be used in real-time data analysis. MapReduce can be executed in parallel on multiple servers, each server is only responsible for completing a part of the wordload, and finally the wordload is sent to the Master Server to merge, calculate the final result set, and return to the client.
The basic idea of ​​MapReduce is shown in the following figure:

[Ms. Zhao Qiang] Use MapReduce to calculate aggregation in MongoDB

In this example, we take a sum as an example. First execute the Map phase, split a large task into several small tasks, and each small task runs on a different node to support distributed computing. This phase is called Map (shown in the blue box); each small task The output result is calculated twice, and finally the result 55 is obtained. This stage is called Reduce (as shown in the red box).

Using MapReduce to calculate aggregation is mainly divided into three steps: Map, Shuffle (Patchwork) and Reduce. Map and Reduce need to be explicitly defined, and shuffle is implemented by MongoDB.

  • Map: Map the operation to each doc, generate Key and Value
  • Shuffle: group by Key, and combine the values ​​with the same key into an array
  • Reduce: Reduce the Value array to a single value

Let's take the following test data (employee data) as an example to demonstrate for everyone.

db.emp.insert(
[
{_id:7369,ename:'SMITH' ,job:'CLERK'    ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD'  ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER'  ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER'  ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER'  ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST'  ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING'  ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK'    ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK'    ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD'  ,job:'ANALYST'  ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK'    ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}
]
);

(Case 1) Find the number of persons in each position in the employee table

var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}
db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"})

(Case 2) Find the sum of the salary of each department in the employee table

var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}
db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"})

(Case Three) Troubleshoot the Map Function

定义自己的emit函数:
var emit = function(key, value) {
print("emit");
print("key: " + key + "  value: " + tojson(value));
}

测试一条数据:
emp7839=db.emp.findOne({_id:7839})
map2.apply(emp7839)
输出以下结果:
emit
key: 10  value: 5000

测试多条数据:
var myCursor=db.emp.find()
while (myCursor.hasNext()) {
    var doc = myCursor.next();
    print ("document _id= " + tojson(doc._id));
    map2.apply(doc);
    print();
}

(Case 4) Troubleshoot the Reduce Function

一个简单的测试案例
var myTestValues = [ 5, 5, 10 ];
var reduce1=function(key,values){return Array.sum(values)}
reduce1("mykey",myTestValues)

测试:Reduce的value包含多个值
测试数据:薪水、奖金:
var myTestObjects = [
                      { sal: 1000, comm: 5 },
                      { sal: 2000, comm: 10 },
                      { sal: 3000, comm: 15 }
                    ];
开发reduce方法:
var reduce2=function(key,values) {
   reducedValue = { sal: 0, comm: 0 };
   for(var i=0;i<values.length;i++) {
     reducedValue.sal += values[i].sal;
     reducedValue.comm += values[i].comm;
   }  
   return reducedValue;
}

测试:
reduce2("aa",myTestObjects)

[Ms. Zhao Qiang] Use MapReduce to calculate aggregation in MongoDB

Guess you like

Origin blog.51cto.com/collen7788/2532733