关于《来,我给你们看一段神奇的mongodb的mapreduce操作!》的解释

各位好,在阅读本文请务必先阅读上一篇文章《来,我给你们看一段神奇的mongodb的mapreduce操作!》,链接:http://gong1208.iteye.com/blog/1830576

因为此文是上一篇文章的解释。

我在上篇博客中指出的mongodb进行mapreduce时出现的奇怪的错误,其实是我个人的错误,原因在于mongodb进行mapreduce时,reduce函数有一段说明: 

Requirements for the reduce Function

The reduce function has the following prototype:

function(key, values) {

...

return result;

}

The reduce function exhibits the following behaviors:

  • The reduce function should not access the database, even to perform read operations.
  • The reduce function should not affect the outside system.
  • MongoDB will not call the reduce function for a key that has only a single value.
  • The reduce function can access the variables defined in the scope parameter.

Because it is possible to invoke the reduce function more than once for the same key, the following properties need to be true

  • he type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:

·      reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )

  • the reduce function must be idempotent. Ensure that the following statement is true:

·      reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )

  • the order of the elements in the valuesArray should not affect the output of the reduce function, so that the following statement is true:

reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )

参考地址: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce

 

这段话的意思是,reduce函数有可能在执行一个任务是可能会被调用多次,而不是我们理解的传统的方法中,一次任务只调用一次,所以,reduce函数必须是幂等的。简单来说,就是reduce函数中接收的value参数的形式,必须和reduce函数返回的结果value的形式一致。

仍然拿我上个例子说明:

起初我是这么写的:

 

2.	printjson("job start");  
3.	var map = function() {  
4.	  emit(this.ip, {value: 1});  
5.	}  
6.	  
7.	var reduce = function(key, values) {  
8.	  var count = 0;  
9.	  values.forEach(function(v) {  
10.	    count += v['value'];  
11.	  });  
12.	  return {count: count };  
13.	  
14.	}  
15.	  
16.	var res = db.runCommand({mapreduce:"RegistRecord",map:map, reduce:reduce, out:"log_results"});  
17.	printjson("job end")  

可以看出emit函数的第二个参数形式为:{value:number},所以reduce函数的values值的形式为:{value:number},所以,reduce函数的返回值形式也必须应当是{value:number},因为reduce函数会将自己的返回值再次作为下一次reduce的输入值使用。

改为如下就正确了:

 

var reduce = function(key, values) { 
var count = {value:0};  
values.forEach(function(v) {  
 count.value += v['value'];  
});  
return count;  
}  

ps:在此特别感谢mongodb社区的Kay.Kim<[email protected]>,我曾发了封邮件向mongodb社区请教此问题,没想到居然收到了社区的热心答复,并为我解答了此问题。

猜你喜欢

转载自gong1208.iteye.com/blog/1841640