Flume series: Record a large amount of Debezium data consumed at one time. The data contains relatively large dml statements, which causes the data to accumulate in the channel.

Flume series: Record a large amount of Debezium data consumed at one time. The data contains relatively large dml statements, which causes the data to accumulate in the channel.

1. Background

  • Debezium collects a large amount of data, the data contains DMl statements, and the data itself is relatively large.
  • Flume consumes a large amount of Kafka topic data, and the data is accumulated in the channel, resulting in a relatively late time for the data sink to HDFS.
  • At this time, the data of the delay detection indicator org_apache_flume_channel_channel1_channelfillpercentage is relatively large, reaching more than 80.
  • Now it is necessary to quickly digest the accumulated data to ensure the timeliness of the data

2. Related technical blogs

おすすめ

転載: blog.csdn.net/zhengzaifeidelushang/article/details/132549716