spark the word statistical algorithm

spark the word statistics, you can have independent mode and trunked mode, the word direct statistical standalone mode
import org.apache.spark. SparkConf {,} SparkContext
object WordCount {
  def main(args: Array[String]): Unit={
  selection conf = new SparkConf ()
    conf.setAppName("WordCountScala")
   // conf.setMaster ( "local [3]") // standalone mode
    // Create an object sparkContext
    val sc=new SparkContext(conf)
    // load file
    // val rdd1 = sc.textFile (( "file: /// G: /downloads/bigdata/wc.txt"), 5) // standalone mode
    val rdd1 = sc.textFile (args (0), 3) // trunked mode
    // crushed
    val rdd2=rdd1.flatMap(_.split(" "))
    // a standard pair
    val rdd3=rdd2.map((_,1))
    //polymerization
    val rdd4=rdd3.reduceByKey(_+_)
    var arr=rdd4.collect()
    arr.foreach(println(_))
  }
}

Clustered mode, class files need to be labeled jar package.

a)上传jar到hdfs
                hdfs dfs -put myspark.jar /user/hadoop/data
            b)执行
                spark-submit --master spark://s101:7077 --class WordCountScala --deploy-mode cluster hdfs://mycluster/user/hadoop/data/myspark.jar /user/hadoop/data/wc.txt
 

Guess you like

Origin blog.csdn.net/nengyu/article/details/92076340