Scala language implements wordcount

object wordCount{
    
    
  def main(args: Array[String]): Unit = {
    
    
    val str = List("hadoop hive hadoop","hive hello mysql pig hello hadoop")
    val res1 = str.flatMap((s:String)=>s.split(" "))//1.按空格来切分单词
    //res1= List(hadoop, hive, hadoop, hive, hello, mysql, pig, hello, hadoop)

    val res2 = res1.map((x:String)=>((x:String),1)) //2.将每个切分后的元素创建成对偶元祖的形式(K,V)
    //res2= List((hadoop,1), (hive,1), (hadoop,1), (hive,1), (hello,1), (mysql,1), (pig,1), (hello,1), (hadoop,1))

    val res3 = res2.groupBy((x:(String,Int))=>(x._1))//3.将对偶元祖中的元素按不同的单词依次分组,  _.1表示元组的第1个值
    //res3= Map(hadoop -> List((hadoop,1), (hadoop,1), (hadoop,1)), hive -> List((hive,1), (hive,1)), mysql -> List((mysql,1)), hello -> List((hello,1), (hello,1)), pig -> List((pig,1)))

    val res4 = res3.toList.map((x:(String,List[(String,Int)]))=>(x._1,x._2.size)) //4.求出每个单词出现的次数,因为Map没有提供计算长度的方法,所以先转换为List,然后计算单词出现的次数
    //res4= List((hadoop,3), (hive,2), (mysql,1), (hello,2), (pig,1))

    /*  对上述代码的简化版
val res2 = res1.map((_,1))
val res3 = res2.groupBy(_._1)
val res4 = res3.toList.map((x)=>(x._1,x._2.size))
val res5 =str.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).toList.map((x)=>(x._1,x._2.size))
     */

    /*  wordcount程序的简化最终版
    val res5 =str.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).toList.map((x)=>(x._1,x._2.size))
    println("res5= "+res5)
     */
    for(item <- res4){
    
    
      println(item)
    }

  }
}


operation result:

Insert picture description here

Here is a brief description of the parameter type inference and abbreviation in the above-mentioned scala
1. When the parameter type can be inferred, the parameter type can be omitted.
2. When the incoming function has only a single parameter, the parentheses can be omitted
3. If The variable only appears once on the right side of =>, you can use _ instead

If you are unclear about the higher-order functions in the above code, you can refer to this blogger's article
https://blog.csdn.net/m0_38109926/article/details/108695731

Guess you like

Origin blog.csdn.net/weixin_44080445/article/details/109501445