05 Use spark word frequency statistics [scala sbt]

We have learned in the interactive mode to complete the word frequency statistics spark among the command line, this section will explain the use of the environment sbt scala code complete idea of ​​them, and word frequency statistics.

1 systems, software and premise constraints

  • CentOS 7 64 workstations of the machine ip is 192.168.100.200, host name danji, the reader is set according to their actual situation
  • Completed scala interactive mode in linux word frequency statistics
    https://www.jianshu.com/p/92257e814e59
  • Statistics have to be word files uploaded to HDFS, as the name / word
  • The first test program has been completed scala idea of
    https://www.jianshu.com/p/ec64c70e6bb6
  • idea 2018.2
  • Permission to remove the effects of the operation, all operations are carried out in order to root

2 operation

  • 1 Create sbt project idea in
    select File-> New-> Project-> Scala-> sbt-> Next
    Creating a project sbt
    take some time.
  • Configuration dependencies 2:
    Add the following in build.sbt:
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
  • 3 WordCount.scala create a class in src / main / scala with the following contents
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
 
object ScalaWordCount {
  def main(args: Array[String]): Unit = {
   //在windows下执行,必须设置本地的hadoop安装路径,倘若打成jar包,上传到linux,则不需要设置
    System.setProperty("hadoop.home.dir", "C:\\hadoop2.7.2")
    val conf: SparkConf = new SparkConf().setAppName("WordCount").setMaster("local[2]")
    // 创建SparkContext
    val sc: SparkContext = new SparkContext(conf)
    sc.textFile("hdfs://192.168.100.200:9000/word")
      .flatMap(_.split(" "))
      .map((_,1))
      .reduceByKey(_+_)
      .saveAsTextFile("hdfs://192.168.100.200:9000/outputscala")

    // 释放资源
    sc.stop()
  }
}
  • 4 execution, among view HDFS service outputscala you can see the results.
    The above is a spark which we use in the process scala word frequency statistics.

Guess you like

Origin www.cnblogs.com/alichengxuyuan/p/12576807.html