scala count strings in a list contained in a read file

Amy Makenzie :

I am new to SO but have spent days going through related to questions. The closest related question I found is How to compare each word of a line in a file with a list element in scala? but that goes back to 2014 so thought there may be different solutions now.

Also in the above quoted post the best answer uses a mutable data struct which I am trying to avoid. The last answer by Dima looked more functional however did not work :(

I am trying to create a similar program in SCALA except the output should also contain an overall count for the keyword and all keywords should be outputed, even if no matches were found, thus the count would be zero.

The keywords to check against are hard coded into a list however I also want to add the option of a 2nd user supplied argument that contains the keywords. So far I have come to the following but got suck:

object FileAnalyser extends App {

val hardcodedkeywords = List("foo", "bar", "hello")

if (args.length > 1) {
  val keywords = args(1).toList
  try {
    val rdd = Source.fromFile(args(0)).getLines.toList.zipWithIndex.flatMap {
      case(line, index) => line.split("\\W+").map { (_, index+1) }
    } //.filter(keywords.contains(_)).groupBy { _._1 }.mapValues(_._2)
  } catch {
    case ioe: IOException => println(ioe)
    case fnf: FileNotFoundException => println(fnf)
    case _: Throwable => println("Uknown error occured")
  }
} else 
  try {
    val rdd = Source.fromFile(args(0)).getLines.toList.zipWithIndex.flatMap {
      case(line, index) => line.split("\\W+").map { (_, index+1) }
    } //filter(hardcodedkeywords.contains(_))
      //.groupBy { _._1 }.mapValues(_._2)
  } catch {
    case ioe: IOException => println(ioe)
    case fnf: FileNotFoundException => println(fnf)
    case _: Throwable => println("Uknown error occured")
  }
}

So far I have managed to use args(0) containing the file to read, to be read, and mapped to a list containing a string per line as well index+1 (as line numbers start from 1 but index starts from 0) The program must be as functional as possible so less mutables and state changes and more higher order functions and list recursions.

Thanks Example output would be:

//alphabetical      //No duplicates
//order             //Increasing in no. 
keyword              lines                count
bar                  [1,2..]                6
foo                  [3,5]                  2
hello                []                     0
jwvh :

Here's a basic outline to how it might be done.

val keywords = List(/*key words here*/)

val resMap = io.Source
  .fromFile(/*file to read*/)
  .getLines()
  .zipWithIndex
  .foldLeft(Map.empty[String,Seq[Int]].withDefaultValue(Seq.empty[Int])){
    case (m, (line, idx)) =>
      val subMap = line.split("\\W+").toSeq  //separate the words
        .filter(keywords.contains)           //keep only key words
        .groupBy(identity)                   //make a Map w/ keyword as key
        .mapValues(_.map(_ => idx+1))        //and List of line numbers as value
        .withDefaultValue(Seq.empty[Int])
      keywords.map(kw => (kw, m(kw) ++ subMap(kw))).toMap
  }

//formatted results (needs work)
println("keyword\t\tlines\t\tcount")
keywords.sorted.foreach{kw =>
  println(kw + "\t\t" +
          resMap(kw).distinct.mkString("[",",","]") + "\t\t" +
          resMap(kw).length
         )
}

some explanation

  • io.Source is the library (actually an object) that offers some basic input/output methods, including fromFile(), which opens a file for reading.
  • getLines() reads from the file one line at a time.
  • zipWithIndex attaches an index value to each line read.
  • foldLeft() reads all the lines of the file, one at a time, and (in this case) builds a Map of all the key words and their line locations.
  • resMap and subMap are just the names I chose to give to the variables I'm building. resMap (result Map) is what's created after the whole file has been processed. subMap is an intermediate Map built from just one line of text from the file.

If you want the option of passing in a collection of key words, I'd do it like this:

val keywords = if (args.length > 1) args.tail.toList else hardcodedkeywords

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=82112&siteId=1