Apache Beam Flatten Iterable<String>

Sudharsan :

In the below code after groupbyKey, I am getting PCollection>>. How to flatten the Iterable in the value before sending to FileIO.

     .apply(GroupByKey.<String, String>create())
     .apply("Write file to output",FileIO.< String, KV<String,String>>writeDynamic()
                .by(KV::getKey)
                .withDestinationCoder(StringUtf8Coder.of())
                .via(Contextful.fn(KV::getValue), TextIO.sink())
                .to("Out")
                .withNaming(key -> FileIO.Write.defaultNaming("file-" + key, ".txt")));

Thanks for the kind help.

Jayadeep Jayaraman :

You need to use a ParDo to flatten the Iterable portion of the PCollection as shown below:-

 PCollection<KV<String, Doc>> urlDocPairs = ...;
 PCollection<KV<String, Iterable<Doc>>> urlToDocs =
     urlDocPairs.apply(GroupByKey.<String, Doc>create());

 PCollection<R> results =
     urlToDocs.apply(ParDo.of(new DoFn<KV<String, Iterable<Doc>>, R>() {
      {@literal @}ProcessElement
       public void processElement(ProcessContext c) {
         String url = c.element().getKey();
         for <String,Doc> docsWithThatUrl : c.element().getValue();
         c.output(docsWithThatUrl)
       }}));

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=400581&siteId=1