Apache Beam 2.28.0 released, big data stream processing and batch programming paradigm

Apache Beam 2.28.0 has been released. Beam is a unified programming model for defining and executing data processing pipelines, including ETL, batch processing, and stream processing. The Beam project focuses on the programming paradigm and interface definition of data processing, and does not involve the implementation of a specific execution engine. Ideally, the data processing program based on Beam can be executed on any distributed computing engine.

Update highlights

I / Os

SpannerIO supports the use of BigDecimal for Numeric fields ( BEAM-11643 )

  • Add Beam schema support to ParquetIO ( BEAM-11526 )
  • Support ParquetTable Writer ( BEAM-8202 )
  • GCP BigQuery sink (streaming inserts) uses the segmentation determined by the runner ( BEAM-11408 )
  • PubSub support types: TIMESTAMP, DATE, TIME, DATETIME ( BEAM-11533 )

New features/improvements

  • ParquetIO adds  readGenericRecords  and  readFilesGenericRecords  methods to read files with unknown schemas. For details, see  PR-13554  and ( BEAM-11460 )
  • Add support for thrift in KafkaTableProvider ( BEAM-11482 )
  • Add support for HadoopFormatIO to skip key/value cloning ( BEAM-11457 )
  • Support conversion to GenericRecords ( BEAM-11571 ) in Convert.to conversion
  • Support reading Parquet files of unknown schema ( BEAM-11460 )

annouce

Guess you like

Origin www.oschina.net/news/131188/beam-2-28-0-released