MapReduce is a programming model, and also a processing algorithm generates a correlation model of large data sets. A user first creates a Map function processing based on the data key / value pair is set, based on the data key / value pair of a set of intermediate output; and then create a Reduce function used to combine all intermediate values having the same value of the intermediate key value.
Simulate a simple map reduce programming
# ### implements a map reduce programming. # ### There are a series of numbers as input. Each number to get more than 7. Finally, adding the remainder Import Time mylist = [134,43,49,34,1,34,89,133,13434,379,134,4343,13434,34454,343,134 ] DEF Surplus (myNum): A = myNum. 7% Print A # ## in order to observe the effect, addition of SLEEP the time.sleep (. 1 ) return A DEF plus_all (mylist): mySum = 0 for onesurplus in Map (Surplus, mylist): mySum = mySum + onesurplus return mySum IF __name__ == '__main__': print (plus_all(mylist))
以上实现了 一个最简单的map reduce 变成模型,只不过map任务仍然是单线程。在map的调用替换成多任务并发即可。以下用4线程并发调起map()。futures.ProcessPoolExecutor()默认调起线程是cpu的线程数。
# ### implements a map reduce programming. # ### There are a series of numbers as input. Each number to get more than 7. Finally, adding the remainder Import Time from Concurrent Import Futures mylist = [134,43,49,34,1,34,89,133,13434,379,134,4343,13434,34454,343,134 ] DEF Surplus (myNum): A =% myNum . 7 # Print (A) # ## In order to observe the effect, addition of SLEEP the time.sleep (. 1 ) return A DEF plus_all (mylist): mySum = 0 with futures.ProcessPoolExecutor ( . 4 ) the pool AS: for onesurplusin pool.map(surplus,mylist): mysum=mysum+onesurplus return mysum if __name__ == '__main__': print (plus_all(mylist))
to sum up:
1, map reduce programming model, the first task into a plurality of parts by the same process flow. Write map function, the function returns a result set is fixed,
2, with concurrent threads invoking map task. All processing map returns results.