spark the word statistics, you can have independent mode and trunked mode, the word direct statistical standalone mode import org.apache.spark. SparkConf {,} SparkContext object WordCount { def main(args: Array[String]): Unit={ selection conf = new SparkConf () conf.setAppName("WordCountScala") // conf.setMaster ( "local [3]") // standalone mode // Create an object sparkContext val sc=new SparkContext(conf) // load file // val rdd1 = sc.textFile (( "file: /// G: /downloads/bigdata/wc.txt"), 5) // standalone mode val rdd1 = sc.textFile (args (0), 3) // trunked mode // crushed val rdd2=rdd1.flatMap(_.split(" ")) // a standard pair val rdd3=rdd2.map((_,1)) //polymerization val rdd4=rdd3.reduceByKey(_+_) var arr=rdd4.collect() arr.foreach(println(_)) } }
Clustered mode, class files need to be labeled jar package.
a)上传jar到hdfs
hdfs dfs -put myspark.jar /user/hadoop/data
b)执行
spark-submit --master spark://s101:7077 --class WordCountScala --deploy-mode cluster hdfs://mycluster/user/hadoop/data/myspark.jar /user/hadoop/data/wc.txt