Spark的Wordcount程序图文详解!

先附上scala实现的代码!

package cn.spark.study.core

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

/**
 * @author Administrator
 */
object WordCount {
  
  def main(args: Array[String]) {
    val conf = new SparkConf()
        .setAppName("WordCount");
    val sc = new SparkContext(conf)
  
    val lines = sc.textFile("hdfs://spark1:9000/spark.txt", 1); 
    val words = lines.flatMap { line => line.split(" ") }   
    val pairs = words.map { word => (word, 1) }   
    val wordCounts = pairs.reduceByKey { _ + _ }
    
    wordCounts.foreach(wordCount => println(wordCount._1 + " appeared " + wordCount._2 + " times."))  
  }
  
}

下面的图就是根据上面的代码一步一步剖析底层实现的原理!

从下面的箭头看整个逻辑流程!

图片来自网络!

猜你喜欢

转载自blog.csdn.net/weixin_41244495/article/details/81159884
今日推荐