Apache Hudi 0.12.2 released

Long term support version

Our goal is to maintain 0.12 for longer and provide a stable version with the latest 0.12.x version for users to migrate to. This version (0.12.2) is the latest 0.12 version.

Migration Guide

This release (0.12.2) does not introduce any new table versions, so if you are using 0.12.0, there is no need to migrate.
If migrating from an older version, please review the migration guidance in previous release notes, specifically the upgrade instructions in 0.6.0 , 0.9.0 , 0.10.0 , 0.11.0 , and 0.12.0 .

bug fix

Version 0.12.2 is mainly for bug fixes and stability. These fixes span many components, including

  • DeltaStreamer
  • Data type/schema related bug fixes
  • Table service
  • metadata table
  • Spark SQL
  • Presto stability/performance fixes
  • Trino stability/performance fixes
  • metasync
  • Flink engine
  • Unit, functional, integration testing and CI

Release Notes

Sub-task

  • [HUDI-5244] - Fix bugs in schema evolution client with lost operation field and not found schema

Bug

  • [HUDI-3453] - Metadata table throws NPE when scheduling compaction plan
  • [HUDI-3661] - Flink async compaction is not thread safe when use watermark
  • [HUDI-4281] - Using hudi to build a large number of tables in spark on hive causes OOM
  • [HUDI-4588] - Ingestion failing if source column is dropped
  • [HUDI-4855] - Bootstrap table from Deltastreamer cannot be read in Spark
  • [HUDI-4893] - More than 1 splits are created for a single log file for MOR table
  • [HUDI-4898] - for mor table, presto/hive shoud respect payload class during merge parquet file and log file
  • [HUDI-4901] - Add avro version to Flink profiles
  • [HUDI-4946] - merge into with no preCombineField has dup row in only insert
  • [HUDI-4952] - Reading from metadata table could fail when there are no completed commits
  • [HUDI-4966] - Meta sync throws exception if TimestampBasedKeyGenerator is used to generate partition path containing slashes
  • [HUDI-4971] - aws bundle causes class loading issue
  • [HUDI-4975] - datahub sync bundle causes class loading issue
  • [HUDI-4998] - Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work
  • [HUDI-5003] - InLineFileSystem will throw NumberFormatException, cause the type of startOffset is int and out of bounds
  • [HUDI-5007] - Prevent Hudi from reading the entire timeline's when performing a LATEST streaming read
  • [HUDI-5008] - Avoid unset HoodieROTablePathFilter in IncrementalRelation
  • [HUDI-5025] - Rollback failed with log file not found when rollOver in rollback process
  • [HUDI-5041] - lock metric register confict error
  • [ HUDI-5057 ] - Fix msck repair hudi board
  • [HUDI-5058] - The primary key cannot be empty when Flink reads an error from the hudi table
  • [HUDI-5061] - bulk insert operation don't throw other exception except IOE Exception
  • [HUDI-5063] - totalScantime and other run time stats missing from commit metadata
  • [HUDI-5070] - Fix Flaky TestCleaner test : testInsertAndCleanByCommits
  • [HUDI-5076] - Non serializable path used with engineContext with metadata table initialization
  • [HUDI-5087] - Max value read from metatable incorrect
  • [HUDI-5088] - Failed to synchronize the hive metadata of the Flink table
  • [HUDI-5092] - Querying Hudi table throws NoSuchMethodError in Databricks runtime
  • [HUDI-5096] - boolean param is broken in HiveSyncTool
  • [HUDI-5097] - Read 0 records from partitioned table without partition fields in table configs
  • [HUDI-5151] - Flink data skipping doesn't work with ClassNotFoundException of InLineFileSystem
  • [HUDI-5157] - Duplicate partition path for chained hudi tables.
  • [HUDI-5163] - Failure handling w/ spark ds write failures
  • [HUDI-5176] - Incremental source may miss commits if there are inflight commits before completed commits
  • [HUDI-5185] - Compaction run fails with --hoodieConfigs
  • [HUDI-5203] - Debezium payload does not handle null-field cases
  • [HUDI-5228] - Flink table service job fs view conf overwrites the one of writing job
  • [HUDI-5242] - Do not fail Meta sync in Deltastreamer when inline table service fails
  • [HUDI-5251] - Unexpected avro dependency in flink 1.15 bundle
  • [HUDI-5253] - HoodieMergeOnReadTableInputFormat could have duplicate records issue if it contains delta files while still splittable
  • [HUDI-5260] - Insert into sql with strict insert mode and no preCombineField should not overwrite existing records
  • [HUDI-5277] - RunClusteringProcedure can't exit corretly
  • [HUDI-5286] - UnsupportedOperationException throws when enabling filesystem retry
  • [HUDI-5291] - NPE in collumn stats for null values
  • [HUDI-5320] - Spark SQL CTAS does not propagate Table properties to actual SparkSqlWriter
  • [HUDI-5325] - Fix Create Table to propagate properly Metadata Table enabling config
  • [HUDI-5336] - Fix log file parsing to consider "." at the beginning
  • [HUDI-5346] - Fixing performance traps in CTAS
  • [HUDI-5347] - Fix Merge Into performance traps
  • [HUDI-5350] - oom cause compaction event lost
  • [HUDI-5351] - Handle meta fields being disabled in Bulk Insert Partitioners
  • [HUDI-5373] - Different fileids are assigned to the same bucket
  • [HUDI-5375] - Fix re-using of file readers w/ metadata table in FileIndex
  • [HUDI-5393] - Remove the reuse of metadata table writer for flink write client
  • [HUDI-5403] - Input Format class has metadata table enabled for file listing unexpectedly by default
  • [HUDI-5409] - Avoid file index and use fs view cache in COW input format
  • [HUDI-5412] - Send the boostrap event if the JM also rebooted

Improvement

  • [HUDI-4526] - improve spillableMapBasePath disk directory is full
  • [HUDI-4799] - improve analyzer exception tip when can not resolve expression
  • [HUDI-4960] - Upgrade Jetty version for Timeline server
  • [HUDI-4980] - Make avg record size calculated based on commit instant only
  • [HUDI-4995] - Dependency conflicts on apache http with other projects
  • [HUDI-4997] - use jackson-v2 replace jackson-v1 import
  • [HUDI-5002] - Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement
  • [HUDI-5027] - Replace hardcoded hbase config keys with HbaseConstants
  • [HUDI-5045] - Add tests to integ test to test bulk_insert followed by upsert
  • [HUDI-5066] - Support hoodie source metaclient cache for flink planner
  • [HUDI-5102] - source operator(monitor and reader) support user uid
  • [HUDI-5104] - Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter
  • [HUDI-5111] - Add metadata on read support to integ tests
  • [HUDI-5184] - Remove export PYSPARK_SUBMIT_ARGS="--master local[*]" from HoodiePySparkQuickstart.py
  • [HUDI-5247] - Clean up java client tests
  • [HUDI-5296] - Support disabling schema on read if not required
  • [HUDI-5338] - Adjust coalesce behavior within "NONE" sort mode for bulk insert
  • [HUDI-5344] - Upgrade com.google.protobuf:protobuf-java
  • [HUDI-5345] - Avoid fs.exists calls for metadata table in HFileBootstrapIndex
  • [HUDI-5348] - Cache file slices within MDT reader
  • [HUDI-5357] - Optimize release artifacts' deployment
  • [HUDI-5370] - Properly close file handles for Metadata writer

Test

Task

  • [ HUDI-3287 ] - Remove unnecessary deps in hudi-kafka-connect
  • [HUDI-5081] - Resources clean-up in hudi-utilities tests
  • [HUDI-5221] - Make the decision for flink sql bucket index case-insensitive
  • [HUDI-5223] - Partial failover for flink
  • [HUDI-5227] - Upgrade Jetty to 9.4.48

Guess you like

Origin blog.csdn.net/weixin_39636364/article/details/128651816