Anoop :
I have Stream of Stream of Words(This format is not set by me and cannot be changed). For ex
Stream<String> doc1 = Stream.of("how", "are", "you", "doing", "doing", "doing");
Stream<String> doc2 = Stream.of("what", "what", "you", "upto");
Stream<String> doc3 = Stream.of("how", "are", "what", "how");
Stream<Stream<String>> docs = Stream.of(doc1, doc2, doc3);
I'm trying to get this into a structure of Map<String, Multiset<Integer>>
(or its corresponding stream as I want to process this further), where the key String
is the word itself and the Multiset<Integer>
represents the number of that word appearances in each document (0's should be excluded). Multiset is a google guava class(not from java.util.).
For example:
how -> {1, 2} // because it appears once in doc1, twice in doc3 and none in doc2(so doc2's count should not be included)
are -> {1, 1} // once in doc1 and once in doc3
you -> {1, 1} // once in doc1 and once in doc2
doing -> {3} // thrice in doc3, none in others
what -> {2,1} // so on
upto -> {1}
What is a good way to do this in Java 8 ?
I tried using a flatMap , but the inner Stream is greatly limiting the options of I have.
Eugene :
Map<String, List<Long>> map = docs.flatMap(
inner -> inner.collect(
Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream())
.collect(Collectors.groupingBy(
Entry::getKey,
Collectors.mapping(Entry::getValue, Collectors.toList())));
System.out.println(map);
// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=435117&siteId=1