Code
object Scala02_WordCountOnYarn {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("wordcount")
val sc = new SparkContext(conf)
val resRDD: RDD[(String, Int)] = sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
resRDD.saveAsTextFile(args(1))
sc.stop()
}
}
submit
bin/spark-submit \
--class com.aura.spark.day01.Scala02_WordCountOnYarn \
--master yarn \
--executor-memory 2G \
--total-executor-cores 8 \
--deploy-mode cluster \
/home/hadoop/jar/WordCount.jar \
/word_in /word_out
Parameter explanation of submit
- \ Means line break.
- After class is the full class name of the class.
- The master specifies the yarn mode to run.
- executor-memory specifies the available memory of each executor.
- total-executor-cores specifies the number of cpu cores of all executors.
- The deploy-mode specifies to run in cluster or client mode.
- /home/hadoop/jar/WordCount.jar is the path of the local jar package.
- /word_in is the input path of the HDFS cluster.
- /word_out is the output path of the HDFS cluster.