scala语言实现wordcount

object wordCount{
    
    
  def main(args: Array[String]): Unit = {
    
    
    val str = List("hadoop hive hadoop","hive hello mysql pig hello hadoop")
    val res1 = str.flatMap((s:String)=>s.split(" "))//1.按空格来切分单词
    //res1= List(hadoop, hive, hadoop, hive, hello, mysql, pig, hello, hadoop)

    val res2 = res1.map((x:String)=>((x:String),1)) //2.将每个切分后的元素创建成对偶元祖的形式(K,V)
    //res2= List((hadoop,1), (hive,1), (hadoop,1), (hive,1), (hello,1), (mysql,1), (pig,1), (hello,1), (hadoop,1))

    val res3 = res2.groupBy((x:(String,Int))=>(x._1))//3.将对偶元祖中的元素按不同的单词依次分组,  _.1表示元组的第1个值
    //res3= Map(hadoop -> List((hadoop,1), (hadoop,1), (hadoop,1)), hive -> List((hive,1), (hive,1)), mysql -> List((mysql,1)), hello -> List((hello,1), (hello,1)), pig -> List((pig,1)))

    val res4 = res3.toList.map((x:(String,List[(String,Int)]))=>(x._1,x._2.size)) //4.求出每个单词出现的次数,因为Map没有提供计算长度的方法,所以先转换为List,然后计算单词出现的次数
    //res4= List((hadoop,3), (hive,2), (mysql,1), (hello,2), (pig,1))

    /*  对上述代码的简化版
val res2 = res1.map((_,1))
val res3 = res2.groupBy(_._1)
val res4 = res3.toList.map((x)=>(x._1,x._2.size))
val res5 =str.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).toList.map((x)=>(x._1,x._2.size))
     */

    /*  wordcount程序的简化最终版
    val res5 =str.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).toList.map((x)=>(x._1,x._2.size))
    println("res5= "+res5)
     */
    for(item <- res4){
    
    
      println(item)
    }

  }
}


运行结果:

在这里插入图片描述

这里对上述scala中参数类型推断和化简写法进行一个简单的说明
1.参数类型是可以推断时,可以省略参数类型
2.当传入的函数,只有单个参数时,可以省去括号
3.如果变量只在=>右边只出现一次,可以用_来代替

对上述代码中高阶函数有不明确的可以参考一下这位博主的文章
https://blog.csdn.net/m0_38109926/article/details/108695731

猜你喜欢

转载自blog.csdn.net/weixin_44080445/article/details/109501445
今日推荐