Understanding MapReduce

1.

A. Write map function, reduce function

#!/usr/bin/env python
import sys
for line in sys.stdin:
     line=line.strip()
     words=line.split()
     for word in words:
          print '%s\t%s' % (word,1)
#!/usr/bin/env python
from operator import itemgetter
import sys
current_word=None
current_count=0
word=None

for line in sys.stdin:
     line=line.strip()
     word,count=line.split('\t',1)
     try:
          count=int(count)
     except ValueError:
          continue
     if current_word==word:
          current_count+=count
     else:
          if current_word:
              print '%s\t%s' % (current_word,current_count)
          current_count=count
          current_word=word
if current_word==word:
     print '%s\t%s' % (current_word,current_count)

B. Modify its authority accordingly

chmod a+x /home/hadoop/wc/mapper.py
chmod a+x /home/hadoop/wc/reducer.py

C. Test and run the code on this machine

D. Put it to run on HDFS

         a. Upload the previously crawled text file to hdfs

         b. Submit tasks with the Hadoop Streaming command

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325790945&siteId=291194637