Count frequency of each word from list of Strings using Java8

user2296988 :

I have two lists of Strings. Need to create a map of occurrences of each string of one list in another list of string. If a String is present even more than in a single string, it should be counted as one occurrence.

For example:

String[] listA={"the", "you" , "how"}; 
String[] listB = {"the dog ate the food", "how is the weather" , "how are you"};

The Map<String, Integer> map will take keys as Strings from listA, and value as the occurence. So map will have key-values as : ("the",2)("you",1)("how",2).

Note: Though "the" is repeated twice in "the dog ate the food", it counted as only one occurrence as it is in the same string.

How do I write this using ? I tried this approach but does not work:

Set<String> sentenceSet = Stream.of(listB).collect(Collectors.toSet());

Map<String, Long> frequency1 =  Stream.of(listA)
    .filter(e -> sentenceSet.contains(e))
    .collect(Collectors.groupingBy(t -> t, Collectors.counting()));
Nikolas :

You need to extract all the words from listB and keep only these that are also listed in listA. Then you simply collect the pairs word -> count to the Map<String, Long>:

String[] listA={"the", "you", "how"};
String[] listB = {"the dog ate the food", "how is the weather" , "how are you"};

Set<String> qualified = new HashSet<>(Arrays.asList(listA));   // make searching easier

Map<String, Long> map = Arrays.stream(listB)   // stream the sentences
    .map(sentence -> sentence.split("\\s+"))   // split by words to Stream<String[]>
    .flatMap(words -> Arrays.stream(words)     // flatmap to Stream<String>
                            .distinct())       // ... as distinct words by sentence
    .filter(qualified::contains)               // keep only the qualified words
    .collect(Collectors.groupingBy(            // collect to the Map
        Function.identity(),                   // ... the key is the words itself
        Collectors.counting()));               // ... the value is its frequency

Output:

{the=2, how=2, you=1}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=477674&siteId=1