Practice data lake iceberg Lesson 39 Analysis of data file changes before and after cleaning snapshots

Series Article Directory

Practice Data Lake iceberg Lesson 1 Getting Started
Practice Data Lake iceberg Lesson 2 Iceberg is based on hadoop’s underlying data format
Practice data lake
iceberg In sqlclient, use SQL to read data from Kafka to iceberg (upgrade the version to flink1.12.7)
practice data lake iceberg Lesson 5 hive catalog features
practice data lake iceberg Lesson 6 write from kafka to iceberg failure problem solving
practice data lake iceberg Lesson 7 Write to iceberg
practice data lake iceberg in real time Lesson 8 hive and iceberg integrate
practice data lake iceberg Lesson 9 merge small files
practice data lake iceberg Lesson 10 snapshot delete
practice data lake iceberg Lesson 11 test partition table integrity Process (creating numbers, building tables, merging, and deleting snapshots)
Practice data lake iceberg Lesson 12 What is a catalog
Practice data lake iceberg Lesson 13 Metadata is many times larger than data files
Practice data lake iceberg Lesson 14 Data merging (to solve the problem of metadata expansion over time)
practice data lake iceberg Lesson 15 spark installation and integration iceberg (jersey package conflict)
practice data lake iceberg Lesson 16 open the cognition of iceberg through spark3 Door
Practice data lake iceberg Lesson 17 Hadoop2.7, spark3 on yarn run iceberg configuration
Practice data lake iceberg Lesson 18 Multiple clients interact with iceberg Start commands (commonly used commands)
Practice data lake iceberg Lesson 19 flink count iceberg , No result problem
practice data lake iceberg Lesson 20 flink + iceberg CDC scenario (version problem, test failed)
practice data lake iceberg Lesson 21 flink1.13.5 + iceberg0.131 CDC (test successful INSERT, change operation failed)
Practice data lake iceberg Lesson 22 flink1.13.5 + iceberg0.131 CDC (CRUD test successful)
practice data lake iceberg Lesson 23 flink-sql restart
practice data lake iceberg from checkpoint Lesson 24 iceberg metadata details Analyzing
the practice data lake iceberg Lesson 25 Running flink sql in the background The effect of addition, deletion and modification
Practice data lake iceberg Lesson 26 checkpoint setting method
Practice data lake iceberg Lesson 27 Flink cdc test program failure restart: can restart from the last time checkpoint to continue working
practice data lake iceberg Lesson 28 Deploy packages that do not exist in the public warehouse to local warehouse
practice data lake iceberg Lesson 29 how to obtain flink jobId elegantly and efficiently
practice data lake iceberg lesson 30 mysql -> iceberg, different clients sometimes have zone issues
Practice data lake iceberg Lesson 31 use github's flink-streaming-platform-web tool to manage flink task flow, test cdc restart scenario practice data lake iceberg lesson 32 DDL statement practice data lake
through hive catalog persistence method
iceberg Lesson 33 Upgrade flink to 1.14, with built-in functioin to support json function
Practice data lake iceberg Lesson 34 Based on data lake icerberg's stream-batch integration architecture-stream architecture test practice
data lake iceberg Lesson 35 is based on data Lake icerberg’s stream-batch integrated architecture – test whether incremental reading is full or only incremental
practice data lake iceberg Lesson 36 Based on data lake icerberg’s stream-batch integrated architecture – update mysql select from icberg syntax is an incremental update test
Practice data lake iceberg Lesson 37 kakfa writes enfource and not enfource to the icberg table of iceberg
Practice data lake iceberg Lesson 38 spark sql, Procedures syntax for data governance (small file merging, cleaning snapshots)
practice data lake iceberg Lesson 39 Cleaning up snapshots before and after data file change analysis
practice data lake iceberg more content directory



foreword

Analyze the changes at the bottom of the data lake table before and after the hive_iceberg_catalog.system.expire_snapshots() command


1. Clean up snapshots

1.1 Before the cleanup, the status quo

Continuing the case of the previous lesson,
two pieces of data were written last time, and checkpoints are made every minute

spark-sql (default)> select * from icebergtest7_xxzh;
data    dt
1       20220801
2       20220802
Time taken: 0.147 seconds, Fetched 2 row(s)

Number of files for the table:

[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1         1553           13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata

1.2 Execute the merge command

Write a date next year (now at 20220805), keeping 10 snapshots:

spark-sql (default)>  CALL spark_catalog.system.expire_snapshots('ods_base.IcebergTest7_XXZH', TIMESTAMP '2023-08-06 00:00:00.000', 10);
22/08/05 15:20:45 WARN HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
deleted_data_files_count        deleted_position_delete_files_count     deleted_equality_delete_files_count     deleted_manifest_files_count    deleted_manifest_lists_count
0       0       0       0       1536
Time taken: 25.491 seconds, Fetched 1 row(s)

It was found that 1536 files were cleaned up:

Observation results, perform count observation every few seconds

[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1         1553           13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1         1553           13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1         1553           13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1         1226           11934332 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1           17            6700166 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1           17            6700166 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
           3            2               1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
           1           17            6700166 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata

Observation results:

Found 18 items
-rw-r--r--   2 root supergroup    1326958 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01662-a3b98f34-6350-4aaf-97c8-5bf5bc322cbb.metadata.json
-rw-r--r--   2 root supergroup    1327818 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01663-918cf80c-eee0-404e-ba63-7e4ff7dbcb1a.metadata.json
-rw-r--r--   2 root supergroup    1328678 2022-08-05 15:19 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01664-1dbe7398-ee20-4016-85e0-3f020f868a36.metadata.json
-rw-r--r--   2 root supergroup    1329538 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01665-ddbdd7a1-ce62-469c-9082-955eb82288d5.metadata.json
-rw-r--r--   2 root supergroup      10978 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01666-4541c2c0-0479-45ca-98f0-fa047047f7d5.metadata.json
-rw-r--r--   2 root supergroup      11833 2022-08-05 15:21 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01667-d99a28ca-8564-43d3-97a6-9d6ffaa65ba5.metadata.json
-rw-r--r--   2 root supergroup       6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:13 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1667142442712269329-1-94a3be8f-1fbb-48c7-87f8-43548cc16a61.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:16 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1963762773888773433-1-d49babf6-122b-4af9-a43c-efd41c252666.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:13 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-2153830365703656208-1-48ca082d-1e41-4bfd-b8f3-db8cd1b450b4.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:14 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-2187623164859521720-1-f35e216f-f68a-472c-a90d-d70ab84aa7d3.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:17 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3609130078797535708-1-d2d4674d-1670-4aa0-aca0-7eb51a56a783.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3863382103427831766-1-8a43e521-f75c-48cf-99a5-af05695e2237.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:19 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-4767088375083646307-1-e7c384f5-73dc-4644-9042-837d46fae36d.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:15 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-6684373263600938900-1-74780fe1-2e66-4f53-8ce7-b797e223a6c9.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-7833168760795469341-1-d06fbdfa-b5ca-4eff-a66f-97b053039b3c.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-8339965443495738233-1-61613f35-7ecb-456a-9e25-cd7be6dfe091.avro
-rw-r--r--   2 root supergroup       4329 2022-08-05 15:21 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-935317679288657184-1-e5a00540-1c34-47f3-9e1b-a847bc334051.avro

1.3 Keep only one snapshot

Found that there is only one manifest file, delete it until there is only one snapshot

spark-sql (default)>  CALL spark_catalog.system.expire_snapshots('ods_base.IcebergTest7_XXZH', TIMESTAMP '2823-08-06 00:00:00.000', 1);
deleted_data_files_count        deleted_position_delete_files_count     deleted_equality_delete_files_count     deleted_manifest_files_count    deleted_manifest_lists_count
0       0       0       0       11
Time taken: 3.878 seconds, Fetched 1 row(s)

The result is as follows:

Found 9 items
-rw-r--r--   2 root supergroup    1329538 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01665-ddbdd7a1-ce62-469c-9082-955eb82288d5.metadata.json
-rw-r--r--   2 root supergroup      10978 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01666-4541c2c0-0479-45ca-98f0-fa047047f7d5.metadata.json
-rw-r--r--   2 root supergroup      11833 2022-08-05 15:21 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01667-d99a28ca-8564-43d3-97a6-9d6ffaa65ba5.metadata.json
-rw-r--r--   2 root supergroup      12694 2022-08-05 15:22 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01668-8e04d8c8-38cc-4b1b-81bc-eb0f9fbcfa5f.metadata.json
-rw-r--r--   2 root supergroup       3237 2022-08-05 15:22 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01669-6206ddc5-830a-4962-bc1a-209c991d6ac7.metadata.json
-rw-r--r--   2 root supergroup       4097 2022-08-05 15:23 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01670-92917778-5f30-4ead-942d-0f05915cb398.metadata.json
-rw-r--r--   2 root supergroup       6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r--   2 root supergroup       4329 2022-08-05 15:22 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1911393555202333427-1-fe0fb043-eef8-4755-a34c-91e3d8d94f9a.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:23 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3064118549879781557-1-5e763c87-757a-46fa-8646-afff2476a51e.avro

2. If there are multiple manifests, how many snapshots can be deleted?

2.1 Add data

Add multiple pieces of data:

[root@hadoop101 ~]#  kafka-console-producer.sh --broker-list  hadoop101:9092,hadoop102:9092,hadoop103:9092  --topic test2_xxzh
>22,20220802
>3,20220803
>4,20220804
>5,20220805
>6,20220806
>7,20220807

The snapshot is increasing:

[root@hadoop103 conf]#  hadoop fs -ls -R  hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/
drwxrwxrwx   - root supergroup          0 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
drwxrwxrwx   - root supergroup          0 2022-08-04 16:12 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220801
-rw-r--r--   2 root supergroup        672 2022-08-04 16:12 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220801/00001-0-989c0c01-b69d-4c66-8c74-7a1a4be08f71-00001.parquet
drwxrwxrwx   - root supergroup          0 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220802
-rw-r--r--   2 root supergroup        680 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220802/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00001.parquet
-rw-r--r--   2 root supergroup        673 2022-08-04 16:46 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220802/00001-0-989c0c01-b69d-4c66-8c74-7a1a4be08f71-00002.parquet
drwxrwxrwx   - root supergroup          0 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220803
-rw-r--r--   2 root supergroup        673 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220803/00000-0-0b872a75-1956-49e4-9093-e4e418eace05-00001.parquet
drwxrwxrwx   - root supergroup          0 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220804
-rw-r--r--   2 root supergroup        673 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220804/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00002.parquet
drwxrwxrwx   - root supergroup          0 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220805
-rw-r--r--   2 root supergroup        672 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220805/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00003.parquet
drwxrwxrwx   - root supergroup          0 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220806
-rw-r--r--   2 root supergroup        673 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220806/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00004.parquet
drwxrwxrwx   - root supergroup          0 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220807
-rw-r--r--   2 root supergroup        673 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220807/00000-0-0b872a75-1956-49e4-9093-e4e418eace05-00002.parquet
drwxrwxrwx   - root supergroup          0 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
-rw-r--r--   2 root supergroup      25977 2022-08-05 15:41 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01695-ff9dfa85-4a33-4557-bd1d-cdc230fa605f.metadata.json
-rw-r--r--   2 root supergroup      26837 2022-08-05 15:42 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01696-f26649f4-877d-4656-a90f-bc72a0f1735d.metadata.json
-rw-r--r--   2 root supergroup      27697 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01697-3bad18ac-7b26-4496-9ac0-114cfa49cbaa.metadata.json
-rw-r--r--   2 root supergroup      28652 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01698-ef6f5e15-a2ec-44cc-ab4c-9297c0ca3321.metadata.json
-rw-r--r--   2 root supergroup      29512 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01699-cfb0ddd6-5c85-499c-866c-9d200eafe965.metadata.json
-rw-r--r--   2 root supergroup      30467 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01700-ed75dcf1-c58b-444e-b8b5-b6085370c535.metadata.json
-rw-r--r--   2 root supergroup       6754 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro
-rw-r--r--   2 root supergroup       6756 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro
-rw-r--r--   2 root supergroup       6755 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro
-rw-r--r--   2 root supergroup       6755 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro
-rw-r--r--   2 root supergroup       6755 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro
-rw-r--r--   2 root supergroup       6753 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro
-rw-r--r--   2 root supergroup       6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:27 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1035563423599163544-1-2b706b4b-0b08-4f51-9216-1eb64e888188.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:32 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1737338957084956478-1-be16b42a-2eb1-4bd3-8728-de6f83e67d66.avro
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:28 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1782562210015627548-1-0a830678-95ed-4cf9-979f-f8dd2f965776.avro
-rw-r--r--   2 root supergroup       4329 2022-08-05 15:22 4521-b5b9-9b854dce0664.avro
。。。删掉中间很多。。。。
-rw-r--r--   2 root supergroup       4330 2022-08-05 15:33 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-8912083785901975773-1-f6a58ef7-03c5-4792-a58c-7bbbffad7b79.avro

2.2 Execute the delete command

Delete to only 1 shapshot

spark-sql (default)> 
                   >   CALL spark_catalog.system.expire_snapshots('ods_base.IcebergTest7_XXZH', TIMESTAMP '2823-08-06 00:00:00.000', 1);
22/08/05 15:47:24 WARN HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
deleted_data_files_count        deleted_position_delete_files_count     deleted_equality_delete_files_count     deleted_manifest_files_count    deleted_manifest_lists_count
0       0       0       0       34
Time taken: 17.441 seconds, Fetched 1 row(s)

The result is as follows: It is found that there are only 2 snap files left, and the m0 file corresponds to each write.

Found 15 items
-rw-r--r--   2 root supergroup      30467 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01700-ed75dcf1-c58b-444e-b8b5-b6085370c535.metadata.json
-rw-r--r--   2 root supergroup      31327 2022-08-05 15:45 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01701-117cd2ed-3480-4842-a7dd-4c90eaab83ab.metadata.json
-rw-r--r--   2 root supergroup      32182 2022-08-05 15:46 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01702-0ef6b60a-43aa-4f3a-a636-34298644ebce.metadata.json
-rw-r--r--   2 root supergroup      33043 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01703-00fb5e1d-29af-4731-adbe-7f1805c20cd1.metadata.json
-rw-r--r--   2 root supergroup       3237 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01704-fb9cc6cb-1492-45b6-8dcb-218c3d56d08b.metadata.json
-rw-r--r--   2 root supergroup       4097 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01705-c6d98e57-d748-460b-93a5-fdfb0d557d67.metadata.json
-rw-r--r--   2 root supergroup       6754 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro
-rw-r--r--   2 root supergroup       6756 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro
-rw-r--r--   2 root supergroup       6755 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro
-rw-r--r--   2 root supergroup       6755 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro
-rw-r--r--   2 root supergroup       6755 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro
-rw-r--r--   2 root supergroup       6753 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro
-rw-r--r--   2 root supergroup       6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r--   2 root supergroup       4649 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3511522569100173178-1-59ecc32e-2aa7-48c6-bbb9-f5aa4b31b442.avro
-rw-r--r--   2 root supergroup       4650 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-4956000582044241255-1-4fa34632-810d-48e7-a470-d5f9c6735f7d.avro

After deleting the snapshot, the data is intact

spark-sql (default)> select * from ods_base.IcebergTest7_XXZH;
data    dt
1       20220801
2       20220802
4       20220804
7       20220807
6       20220806
3       20220803
5       20220805
22      20220802
Time taken: 1.649 seconds, Fetched 8 row(s)

3. Metadata characteristics (view snap content)

3.1 The content of snap, including 7 m0 files

[root@hadoop103 snap]# java -jar /opt/software/avro-tools-1.11.0.jar  tojson --pretty  snap-3511522569100173178-1-59ecc32e-2aa7-48c6-bbb9-f5aa4b31b442.avro 
22/08/05 15:58:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{
    
    
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro",
  "manifest_length" : 6753,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1695,
  "min_sequence_number" : 1695,
  "added_snapshot_id" : 2845024222990467689,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    
    
    "array" : [ {
    
    
      "contains_null" : false,
      "contains_nan" : {
    
    
        "boolean" : false
      },
      "lower_bound" : {
    
    
        "bytes" : "20220807"
      },
      "upper_bound" : {
    
    
        "bytes" : "20220807"
      }
    } ]
  }
}
{
    
    
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro",
  "manifest_length" : 6755,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1693,
  "min_sequence_number" : 1693,
  "added_snapshot_id" : 3476529947294323623,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    
    
    "array" : [ {
    
    
      "contains_null" : false,
      "contains_nan" : {
    
    
        "boolean" : false
      },
      "lower_bound" : {
    
    
        "bytes" : "20220806"
      },
      "upper_bound" : {
    
    
        "bytes" : "20220806"
      }
    } ]
  }
}
{
    
    
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro",
  "manifest_length" : 6756,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1689,
  "min_sequence_number" : 1689,
  "added_snapshot_id" : 5462232017147497616,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    
    
    "array" : [ {
    
    
      "contains_null" : false,
      "contains_nan" : {
    
    
        "boolean" : false
      },
      "lower_bound" : {
    
    
        "bytes" : "20220805"
      },
      "upper_bound" : {
    
    
        "bytes" : "20220805"
      }
    } ]
  }
}
{
    
    
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro",
  "manifest_length" : 6755,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1688,
  "min_sequence_number" : 1688,
  "added_snapshot_id" : 3246455649213713509,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    
    
    "array" : [ {
    
    
      "contains_null" : false,
      "contains_nan" : {
    
    
        "boolean" : false
      },
      "lower_bound" : {
    
    
        "bytes" : "20220804"
      },
      "upper_bound" : {
    
    
        "bytes" : "20220804"
      }
    } ]
  }
}
{
    
    
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro",
  "manifest_length" : 6754,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1687,
  "min_sequence_number" : 1687,
  "added_snapshot_id" : 4917712002051492927,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    
    
    "array" : [ {
    
    
      "contains_null" : false,
      "contains_nan" : {
    
    
        "boolean" : false
      },
      "lower_bound" : {
    
    
        "bytes" : "20220803"
      },
      "upper_bound" : {
    
    
        "bytes" : "20220803"
      }
    } ]
  }
}
{
    
    
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro",
  "manifest_length" : 6755,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1684,
  "min_sequence_number" : 1684,
  "added_snapshot_id" : 3096920793835932503,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    
    
    "array" : [ {
    
    
      "contains_null" : false,
      "contains_nan" : {
    
    
        "boolean" : false
      },
      "lower_bound" : {
    
    
        "bytes" : "20220802"
      },
      "upper_bound" : {
    
    
        "bytes" : "20220802"
      }
    } ]
  }
}
{
    
    
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro",
  "manifest_length" : 6798,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 84,
  "min_sequence_number" : 12,
  "added_snapshot_id" : 8562765270417336551,
  "added_data_files_count" : 0,
  "existing_data_files_count" : 2,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 0,
  "existing_rows_count" : 2,
  "deleted_rows_count" : 0,
  "partitions" : {
    
    
    "array" : [ {
    
    
      "contains_null" : false,
      "contains_nan" : {
    
    
        "boolean" : false
      },
      "lower_bound" : {
    
    
        "bytes" : "20220801"
      },
      "upper_bound" : {
    
    
        "bytes" : "20220802"
      }
    } ]
  }
}

snap2 also contains 7 m0 files

[root@hadoop103 snap]#  java -jar /opt/software/avro-tools-1.11.0.jar  tojson --pretty  snap-4956000582044241255-1-4fa34632-810d-48e7-a470-d5f9c6735f7d.avro 22/08/05 16:15:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro",
  "manifest_length" : 6753,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1695,
  "min_sequence_number" : 1695,
  "added_snapshot_id" : 2845024222990467689,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    "array" : [ {
      "contains_null" : false,
      "contains_nan" : {
        "boolean" : false
      },
      "lower_bound" : {
        "bytes" : "20220807"
      },
      "upper_bound" : {
        "bytes" : "20220807"
      }
    } ]
  }
}
{
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro",
  "manifest_length" : 6755,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1693,
  "min_sequence_number" : 1693,
  "added_snapshot_id" : 3476529947294323623,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    "array" : [ {
      "contains_null" : false,
      "contains_nan" : {
        "boolean" : false
      },
      "lower_bound" : {
        "bytes" : "20220806"
      },
      "upper_bound" : {
        "bytes" : "20220806"
      }
    } ]
  }
}
{
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro",
  "manifest_length" : 6756,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1689,
  "min_sequence_number" : 1689,
  "added_snapshot_id" : 5462232017147497616,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    "array" : [ {
      "contains_null" : false,
      "contains_nan" : {
        "boolean" : false
      },
      "lower_bound" : {
        "bytes" : "20220805"
      },
      "upper_bound" : {
        "bytes" : "20220805"
      }
    } ]
  }
}
{
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro",
  "manifest_length" : 6755,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1688,
  "min_sequence_number" : 1688,
  "added_snapshot_id" : 3246455649213713509,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    "array" : [ {
      "contains_null" : false,
      "contains_nan" : {
        "boolean" : false
      },
      "lower_bound" : {
        "bytes" : "20220804"
      },
      "upper_bound" : {
        "bytes" : "20220804"
      }
    } ]
  }
}
{
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro",
  "manifest_length" : 6754,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1687,
  "min_sequence_number" : 1687,
  "added_snapshot_id" : 4917712002051492927,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    "array" : [ {
      "contains_null" : false,
      "contains_nan" : {
        "boolean" : false
      },
      "lower_bound" : {
        "bytes" : "20220803"
      },
      "upper_bound" : {
        "bytes" : "20220803"
      }
    } ]
  }
}
{
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro",
  "manifest_length" : 6755,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 1684,
  "min_sequence_number" : 1684,
  "added_snapshot_id" : 3096920793835932503,
  "added_data_files_count" : 1,
  "existing_data_files_count" : 0,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 1,
  "existing_rows_count" : 0,
  "deleted_rows_count" : 0,
  "partitions" : {
    "array" : [ {
      "contains_null" : false,
      "contains_nan" : {
        "boolean" : false
      },
      "lower_bound" : {
        "bytes" : "20220802"
      },
      "upper_bound" : {
        "bytes" : "20220802"
      }
    } ]
  }
}
{
  "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro",
  "manifest_length" : 6798,
  "partition_spec_id" : 0,
  "content" : 0,
  "sequence_number" : 84,
  "min_sequence_number" : 12,
  "added_snapshot_id" : 8562765270417336551,
  "added_data_files_count" : 0,
  "existing_data_files_count" : 2,
  "deleted_data_files_count" : 0,
  "added_rows_count" : 0,
  "existing_rows_count" : 2,
  "deleted_rows_count" : 0,
  "partitions" : {
    "array" : [ {
      "contains_null" : false,
      "contains_nan" : {
        "boolean" : false
      },
      "lower_bound" : {
        "bytes" : "20220801"
      },
      "upper_bound" : {
        "bytes" : "20220802"
      }
    } ]
  }
}

The third snap (generated automatically after one minute)

[root@hadoop103 snap]# java -jar /opt/software/avro-tools-1.11.0.jar tojson --pretty snap-5188266964869746455-1-abd5575f-12fc-43fd-bc9e-5cb69a0e03df.avro
22/08/05 16:19:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro”,
“manifest_length” : 6753,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1695,
“min_sequence_number” : 1695,
“added_snapshot_id” : 2845024222990467689,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220807” }, “upper_bound” : {
“bytes” : “20220807”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro”,
“manifest_length” : 6755,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1693,
“min_sequence_number” : 1693,
“added_snapshot_id” : 3476529947294323623,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220806”
},
“upper_bound” : {
“bytes” : “20220806”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro”,
“manifest_length” : 6756,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1689,
“min_sequence_number” : 1689,
“added_snapshot_id” : 5462232017147497616,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220805”
},
“upper_bound” : {
“bytes” : “20220805”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro”,
“manifest_length” : 6755,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1688,
“min_sequence_number” : 1688,
“added_snapshot_id” : 3246455649213713509,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220804”
},
“upper_bound” : {
“bytes” : “20220804”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro”,
“manifest_length” : 6754,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1687,
“min_sequence_number” : 1687,
“added_snapshot_id” : 4917712002051492927,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220803”
},
“upper_bound” : {
“bytes” : “20220803”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro”,
“manifest_length” : 6755,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1684,
“min_sequence_number” : 1684,
“added_snapshot_id” : 3096920793835932503,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220802”
},
“upper_bound” : {
“bytes” : “20220802”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro”,
“manifest_length” : 6798,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 84,
“min_sequence_number” : 12,
“added_snapshot_id” : 8562765270417336551,
“added_data_files_count” : 0,
“existing_data_files_count” : 2,
“deleted_data_files_count” : 0,
“added_rows_count” : 0,
“existing_rows_count” : 2,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220801”
},
“upper_bound” : {
“bytes” : “20220802”
}
} ]
}
}

3.2 Summarize the pointing of the snap file

insert image description here
Snap points to multiple m0 files, and m0 points to data files, so as to obtain all file information.

4. After the m0 file is merged, does the subsequent m0 ​​file start from the merged file?

Continue to the next lesson. . . .


Summarize

For example: the above is what we will talk about today. This article only briefly introduces the use of pandas, and pandas provides a large number of functions and methods that allow us to process data quickly and easily.

Guess you like

Origin blog.csdn.net/spark_dev/article/details/126178747