WordCount is submitted to Spark-Yarn to run

Code

object Scala02_WordCountOnYarn {
    
    
  def main(args: Array[String]): Unit = {
    
    
    val conf: SparkConf = new SparkConf().setAppName("wordcount")
    val sc = new SparkContext(conf)
    val resRDD: RDD[(String, Int)] = sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
    resRDD.saveAsTextFile(args(1))
    sc.stop()
  }
}

submit

bin/spark-submit \
--class com.aura.spark.day01.Scala02_WordCountOnYarn \
--master yarn \
--executor-memory 2G \
--total-executor-cores 8 \
--deploy-mode cluster \
/home/hadoop/jar/WordCount.jar \
/word_in /word_out

Parameter explanation of submit

  • \ Means line break.
  • After class is the full class name of the class.
  • The master specifies the yarn mode to run.
  • executor-memory specifies the available memory of each executor.
  • total-executor-cores specifies the number of cpu cores of all executors.
  • The deploy-mode specifies to run in cluster or client mode.
  • /home/hadoop/jar/WordCount.jar is the path of the local jar package.
  • /word_in is the input path of the HDFS cluster.
  • /word_out is the output path of the HDFS cluster.

Guess you like

Origin blog.csdn.net/FlatTiger/article/details/114883236
Recommended