快学Scala-Actor并发编程实现WordCount

使用scala的多线程来做wordcount之前至少要知道单击版怎么做wordcount,所以先在命令行做单机版的单词计数,具体解释参考
单词计数在D盘下有words.txt文件和words.log,内容均如下

hello tom
hello jerry
hello tom
hello jerry
hello tom
hello tom

现在对words.txt内容做wordcount

scala> Source.fromFile("d://words.txt").getLines().toList
res5: List[String] = List(hello tom, hello jerry, hello tom, hello jerry, hello tom, hello tom)

scala> Source.fromFile("d://words.txt").getLines().toList.map(_.split(" "))
res6: List[Array[String]] = List(Array(hello, tom), Array(hello, jerry), Array(hello, tom), Array(hello, jerry), Array(hello, tom), Array(hello, tom))

scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" "))
res7: List[String] = List(hello, tom, hello, jerry, hello, tom, hello, jerry, hello, tom, hello, tom)

scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1))
res8: List[(String, Int)] = List((hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (tom,1), (hello,1), (tom,1))

scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1)
res9: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(tom -> List((tom,1), (tom,1), (tom,1), (tom,1)), jerry -> List((jerry,1), (jerry,1)), hello -> List((hello,1), (hello,1), (hello,1), (hello,1), (hello,1), (hello,1)))

scala> Source.fromFile("d://words.txt").getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues(_.size)
res10: scala.collection.immutable.Map[String,Int] = Map(tom -> 4, jerry -> 2, hello -> 6)

单机版搞定,那么使用scala的Actor来做

case class WordCountTask(filename : String)
case class ResultTask(map : Map[String,Int])
case object StopTask

class WordCountActor extends Actor{
  override def act(): Unit = {
    loop{
      react {
        case WordCountTask(filename) => {
          //得到的是一个map,Map(tom -> 4, jerry -> 2, hello -> 6)
          val wcResultMap = Source.fromFile(filename).getLines().toList.flatMap(_.split(" ")).map((_,1)).groupBy(_._1).mapValues(_.size)
          //结果方法哦task中返回给发送者
          sender ! ResultTask(wcResultMap)
        }
          //退出
        case StopTask => {
          exit()
        }
      }
    }
  }
}

object WordCountActor {
  def main(args: Array[String]): Unit = {

    val responseSet = new mutable.HashSet[Future[Any]]()
    val resultList = new ListBuffer[ResultTask]

    //指定进行单词计数的文件
    val files = Array ("d://words.txt","d://words.log")
    //有几个文件就启几个actor
    for (file <- files) {
      val actor = new WordCountActor
      //启动线程,发送异步消息等待接收返回结果
      val response = actor.start() !! WordCountTask(file)
      //接收结果放到Set中
      responseSet += response
    }
    while (responseSet.size > 0){
      // 获取接收到了消息的Future放到集合filterSet中
      //responseSet中虽然有Future引用,但是此时Future中还不一定有内容
      val filterSet = responseSet.filter(_.isSet)
      for (ele <- filterSet) {
        //取出Future中信息(ResultTask(wcResultMap)),f.apply()得到Futrue里面的数据
        val result = ele.apply().asInstanceOf[ResultTask]
        //ListBuffer(ResultTask(Map(tom -> 4, jerry -> 2, hello -> 6),...))
        resultList += result
        //Set中移除
        responseSet -= ele
      }
      //睡眠一会,保证消息返回完毕
      Thread.sleep(300)
    }
    //下面做的相当于汇总的功能mapreduce中的reduce

    //ListBuffer((tom,4),(jerry,2),(hello,6)...)
    val r1 = resultList.flatMap(_.map)
    //Map((tom,ListBuffer((tom,4),(tom,4))),(..),(...))
    val r2 = r1.groupBy(_._1)
    val r3 = r2.mapValues(_.foldLeft(0)(_+_._2))
    println(r3)

  }
}
输出
Map(tom -> 8, jerry -> 4, hello -> 12)

注:上面的代码来源于学习资料,其实又些地方似乎不完善,比如做睡眠似乎就没考虑到Future的apply阻塞特性,可以不用过滤也能实现。把while那段改成下面的没问题

 while (responseSet.size > 0){
      for (ele <- responseSet) {
        //取出Future中信息(ResultTask(wcResultMap)),f.apply()得到Futrue里面的数据
        val result = ele.apply().asInstanceOf[ResultTask]
        //ListBuffer(ResultTask(Map(tom -> 4, jerry -> 2, hello -> 6),...))
        resultList += result
        //Set中移除
        responseSet -= ele
      }
    }

猜你喜欢

转载自blog.csdn.net/qq_37334135/article/details/78655441
今日推荐