Series Article Directory
Practice Data Lake iceberg Lesson 1 Getting Started
Practice Data Lake iceberg Lesson 2 Iceberg is based on hadoop’s underlying data format
Practice data lake
iceberg In sqlclient, use SQL to read data from Kafka to iceberg (upgrade the version to flink1.12.7)
practice data lake iceberg Lesson 5 hive catalog features
practice data lake iceberg Lesson 6 write from kafka to iceberg failure problem solving
practice data lake iceberg Lesson 7 Write to iceberg
practice data lake iceberg in real time Lesson 8 hive and iceberg integrate
practice data lake iceberg Lesson 9 merge small files
practice data lake iceberg Lesson 10 snapshot delete
practice data lake iceberg Lesson 11 test partition table integrity Process (creating numbers, building tables, merging, and deleting snapshots)
Practice data lake iceberg Lesson 12 What is a catalog
Practice data lake iceberg Lesson 13 Metadata is many times larger than data files
Practice data lake iceberg Lesson 14 Data merging (to solve the problem of metadata expansion over time)
practice data lake iceberg Lesson 15 spark installation and integration iceberg (jersey package conflict)
practice data lake iceberg Lesson 16 open the cognition of iceberg through spark3 Door
Practice data lake iceberg Lesson 17 Hadoop2.7, spark3 on yarn run iceberg configuration
Practice data lake iceberg Lesson 18 Multiple clients interact with iceberg Start commands (commonly used commands)
Practice data lake iceberg Lesson 19 flink count iceberg , No result problem
practice data lake iceberg Lesson 20 flink + iceberg CDC scenario (version problem, test failed)
practice data lake iceberg Lesson 21 flink1.13.5 + iceberg0.131 CDC (test successful INSERT, change operation failed)
Practice data lake iceberg Lesson 22 flink1.13.5 + iceberg0.131 CDC (CRUD test successful)
practice data lake iceberg Lesson 23 flink-sql restart
practice data lake iceberg from checkpoint Lesson 24 iceberg metadata details Analyzing
the practice data lake iceberg Lesson 25 Running flink sql in the background The effect of addition, deletion and modification
Practice data lake iceberg Lesson 26 checkpoint setting method
Practice data lake iceberg Lesson 27 Flink cdc test program failure restart: can restart from the last time checkpoint to continue working
practice data lake iceberg Lesson 28 Deploy packages that do not exist in the public warehouse to local warehouse
practice data lake iceberg Lesson 29 how to obtain flink jobId elegantly and efficiently
practice data lake iceberg lesson 30 mysql -> iceberg, different clients sometimes have zone issues
Practice data lake iceberg Lesson 31 use github's flink-streaming-platform-web tool to manage flink task flow, test cdc restart scenario practice data lake iceberg lesson 32 DDL statement practice data lake
through hive catalog persistence method
iceberg Lesson 33 Upgrade flink to 1.14, with built-in functioin to support json function
Practice data lake iceberg Lesson 34 Based on data lake icerberg's stream-batch integration architecture-stream architecture test practice
data lake iceberg Lesson 35 is based on data Lake icerberg’s stream-batch integrated architecture – test whether incremental reading is full or only incremental
practice data lake iceberg Lesson 36 Based on data lake icerberg’s stream-batch integrated architecture – update mysql select from icberg syntax is an incremental update test
Practice data lake iceberg Lesson 37 kakfa writes enfource and not enfource to the icberg table of iceberg
Practice data lake iceberg Lesson 38 spark sql, Procedures syntax for data governance (small file merging, cleaning snapshots)
practice data lake iceberg Lesson 39 Cleaning up snapshots before and after data file change analysis
practice data lake iceberg more content directory
Article directory
foreword
Analyze the changes at the bottom of the data lake table before and after the hive_iceberg_catalog.system.expire_snapshots() command
1. Clean up snapshots
1.1 Before the cleanup, the status quo
Continuing the case of the previous lesson,
two pieces of data were written last time, and checkpoints are made every minute
spark-sql (default)> select * from icebergtest7_xxzh;
data dt
1 20220801
2 20220802
Time taken: 0.147 seconds, Fetched 2 row(s)
Number of files for the table:
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 1553 13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
1.2 Execute the merge command
Write a date next year (now at 20220805), keeping 10 snapshots:
spark-sql (default)> CALL spark_catalog.system.expire_snapshots('ods_base.IcebergTest7_XXZH', TIMESTAMP '2023-08-06 00:00:00.000', 10);
22/08/05 15:20:45 WARN HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
deleted_data_files_count deleted_position_delete_files_count deleted_equality_delete_files_count deleted_manifest_files_count deleted_manifest_lists_count
0 0 0 0 1536
Time taken: 25.491 seconds, Fetched 1 row(s)
It was found that 1536 files were cleaned up:
Observation results, perform count observation every few seconds
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 1553 13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 1553 13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 1553 13349847 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 1226 11934332 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 17 6700166 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 17 6700166 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
[root@hadoop103 conf]# hadoop fs -count hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/*
3 2 1345 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
1 17 6700166 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
Observation results:
Found 18 items
-rw-r--r-- 2 root supergroup 1326958 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01662-a3b98f34-6350-4aaf-97c8-5bf5bc322cbb.metadata.json
-rw-r--r-- 2 root supergroup 1327818 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01663-918cf80c-eee0-404e-ba63-7e4ff7dbcb1a.metadata.json
-rw-r--r-- 2 root supergroup 1328678 2022-08-05 15:19 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01664-1dbe7398-ee20-4016-85e0-3f020f868a36.metadata.json
-rw-r--r-- 2 root supergroup 1329538 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01665-ddbdd7a1-ce62-469c-9082-955eb82288d5.metadata.json
-rw-r--r-- 2 root supergroup 10978 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01666-4541c2c0-0479-45ca-98f0-fa047047f7d5.metadata.json
-rw-r--r-- 2 root supergroup 11833 2022-08-05 15:21 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01667-d99a28ca-8564-43d3-97a6-9d6ffaa65ba5.metadata.json
-rw-r--r-- 2 root supergroup 6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:13 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1667142442712269329-1-94a3be8f-1fbb-48c7-87f8-43548cc16a61.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:16 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1963762773888773433-1-d49babf6-122b-4af9-a43c-efd41c252666.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:13 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-2153830365703656208-1-48ca082d-1e41-4bfd-b8f3-db8cd1b450b4.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:14 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-2187623164859521720-1-f35e216f-f68a-472c-a90d-d70ab84aa7d3.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:17 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3609130078797535708-1-d2d4674d-1670-4aa0-aca0-7eb51a56a783.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3863382103427831766-1-8a43e521-f75c-48cf-99a5-af05695e2237.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:19 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-4767088375083646307-1-e7c384f5-73dc-4644-9042-837d46fae36d.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:15 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-6684373263600938900-1-74780fe1-2e66-4f53-8ce7-b797e223a6c9.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-7833168760795469341-1-d06fbdfa-b5ca-4eff-a66f-97b053039b3c.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:18 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-8339965443495738233-1-61613f35-7ecb-456a-9e25-cd7be6dfe091.avro
-rw-r--r-- 2 root supergroup 4329 2022-08-05 15:21 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-935317679288657184-1-e5a00540-1c34-47f3-9e1b-a847bc334051.avro
1.3 Keep only one snapshot
Found that there is only one manifest file, delete it until there is only one snapshot
spark-sql (default)> CALL spark_catalog.system.expire_snapshots('ods_base.IcebergTest7_XXZH', TIMESTAMP '2823-08-06 00:00:00.000', 1);
deleted_data_files_count deleted_position_delete_files_count deleted_equality_delete_files_count deleted_manifest_files_count deleted_manifest_lists_count
0 0 0 0 11
Time taken: 3.878 seconds, Fetched 1 row(s)
The result is as follows:
Found 9 items
-rw-r--r-- 2 root supergroup 1329538 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01665-ddbdd7a1-ce62-469c-9082-955eb82288d5.metadata.json
-rw-r--r-- 2 root supergroup 10978 2022-08-05 15:20 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01666-4541c2c0-0479-45ca-98f0-fa047047f7d5.metadata.json
-rw-r--r-- 2 root supergroup 11833 2022-08-05 15:21 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01667-d99a28ca-8564-43d3-97a6-9d6ffaa65ba5.metadata.json
-rw-r--r-- 2 root supergroup 12694 2022-08-05 15:22 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01668-8e04d8c8-38cc-4b1b-81bc-eb0f9fbcfa5f.metadata.json
-rw-r--r-- 2 root supergroup 3237 2022-08-05 15:22 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01669-6206ddc5-830a-4962-bc1a-209c991d6ac7.metadata.json
-rw-r--r-- 2 root supergroup 4097 2022-08-05 15:23 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01670-92917778-5f30-4ead-942d-0f05915cb398.metadata.json
-rw-r--r-- 2 root supergroup 6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r-- 2 root supergroup 4329 2022-08-05 15:22 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1911393555202333427-1-fe0fb043-eef8-4755-a34c-91e3d8d94f9a.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:23 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3064118549879781557-1-5e763c87-757a-46fa-8646-afff2476a51e.avro
2. If there are multiple manifests, how many snapshots can be deleted?
2.1 Add data
Add multiple pieces of data:
[root@hadoop101 ~]# kafka-console-producer.sh --broker-list hadoop101:9092,hadoop102:9092,hadoop103:9092 --topic test2_xxzh
>22,20220802
>3,20220803
>4,20220804
>5,20220805
>6,20220806
>7,20220807
The snapshot is increasing:
[root@hadoop103 conf]# hadoop fs -ls -R hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/
drwxrwxrwx - root supergroup 0 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data
drwxrwxrwx - root supergroup 0 2022-08-04 16:12 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220801
-rw-r--r-- 2 root supergroup 672 2022-08-04 16:12 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220801/00001-0-989c0c01-b69d-4c66-8c74-7a1a4be08f71-00001.parquet
drwxrwxrwx - root supergroup 0 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220802
-rw-r--r-- 2 root supergroup 680 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220802/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00001.parquet
-rw-r--r-- 2 root supergroup 673 2022-08-04 16:46 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220802/00001-0-989c0c01-b69d-4c66-8c74-7a1a4be08f71-00002.parquet
drwxrwxrwx - root supergroup 0 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220803
-rw-r--r-- 2 root supergroup 673 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220803/00000-0-0b872a75-1956-49e4-9093-e4e418eace05-00001.parquet
drwxrwxrwx - root supergroup 0 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220804
-rw-r--r-- 2 root supergroup 673 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220804/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00002.parquet
drwxrwxrwx - root supergroup 0 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220805
-rw-r--r-- 2 root supergroup 672 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220805/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00003.parquet
drwxrwxrwx - root supergroup 0 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220806
-rw-r--r-- 2 root supergroup 673 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220806/00001-0-52c0f221-3908-447d-9441-ed0be045c3ca-00004.parquet
drwxrwxrwx - root supergroup 0 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220807
-rw-r--r-- 2 root supergroup 673 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/data/dt=20220807/00000-0-0b872a75-1956-49e4-9093-e4e418eace05-00002.parquet
drwxrwxrwx - root supergroup 0 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata
-rw-r--r-- 2 root supergroup 25977 2022-08-05 15:41 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01695-ff9dfa85-4a33-4557-bd1d-cdc230fa605f.metadata.json
-rw-r--r-- 2 root supergroup 26837 2022-08-05 15:42 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01696-f26649f4-877d-4656-a90f-bc72a0f1735d.metadata.json
-rw-r--r-- 2 root supergroup 27697 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01697-3bad18ac-7b26-4496-9ac0-114cfa49cbaa.metadata.json
-rw-r--r-- 2 root supergroup 28652 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01698-ef6f5e15-a2ec-44cc-ab4c-9297c0ca3321.metadata.json
-rw-r--r-- 2 root supergroup 29512 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01699-cfb0ddd6-5c85-499c-866c-9d200eafe965.metadata.json
-rw-r--r-- 2 root supergroup 30467 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01700-ed75dcf1-c58b-444e-b8b5-b6085370c535.metadata.json
-rw-r--r-- 2 root supergroup 6754 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro
-rw-r--r-- 2 root supergroup 6756 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro
-rw-r--r-- 2 root supergroup 6755 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro
-rw-r--r-- 2 root supergroup 6755 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro
-rw-r--r-- 2 root supergroup 6755 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro
-rw-r--r-- 2 root supergroup 6753 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro
-rw-r--r-- 2 root supergroup 6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:27 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1035563423599163544-1-2b706b4b-0b08-4f51-9216-1eb64e888188.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:32 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1737338957084956478-1-be16b42a-2eb1-4bd3-8728-de6f83e67d66.avro
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:28 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-1782562210015627548-1-0a830678-95ed-4cf9-979f-f8dd2f965776.avro
-rw-r--r-- 2 root supergroup 4329 2022-08-05 15:22 4521-b5b9-9b854dce0664.avro
。。。删掉中间很多。。。。
-rw-r--r-- 2 root supergroup 4330 2022-08-05 15:33 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-8912083785901975773-1-f6a58ef7-03c5-4792-a58c-7bbbffad7b79.avro
2.2 Execute the delete command
Delete to only 1 shapshot
spark-sql (default)>
> CALL spark_catalog.system.expire_snapshots('ods_base.IcebergTest7_XXZH', TIMESTAMP '2823-08-06 00:00:00.000', 1);
22/08/05 15:47:24 WARN HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
deleted_data_files_count deleted_position_delete_files_count deleted_equality_delete_files_count deleted_manifest_files_count deleted_manifest_lists_count
0 0 0 0 34
Time taken: 17.441 seconds, Fetched 1 row(s)
The result is as follows: It is found that there are only 2 snap files left, and the m0 file corresponds to each write.
Found 15 items
-rw-r--r-- 2 root supergroup 30467 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01700-ed75dcf1-c58b-444e-b8b5-b6085370c535.metadata.json
-rw-r--r-- 2 root supergroup 31327 2022-08-05 15:45 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01701-117cd2ed-3480-4842-a7dd-4c90eaab83ab.metadata.json
-rw-r--r-- 2 root supergroup 32182 2022-08-05 15:46 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01702-0ef6b60a-43aa-4f3a-a636-34298644ebce.metadata.json
-rw-r--r-- 2 root supergroup 33043 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01703-00fb5e1d-29af-4731-adbe-7f1805c20cd1.metadata.json
-rw-r--r-- 2 root supergroup 3237 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01704-fb9cc6cb-1492-45b6-8dcb-218c3d56d08b.metadata.json
-rw-r--r-- 2 root supergroup 4097 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/01705-c6d98e57-d748-460b-93a5-fdfb0d557d67.metadata.json
-rw-r--r-- 2 root supergroup 6754 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro
-rw-r--r-- 2 root supergroup 6756 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro
-rw-r--r-- 2 root supergroup 6755 2022-08-05 15:38 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro
-rw-r--r-- 2 root supergroup 6755 2022-08-05 15:40 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro
-rw-r--r-- 2 root supergroup 6755 2022-08-05 15:43 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro
-rw-r--r-- 2 root supergroup 6753 2022-08-05 15:44 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro
-rw-r--r-- 2 root supergroup 6798 2022-08-04 17:11 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro
-rw-r--r-- 2 root supergroup 4649 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-3511522569100173178-1-59ecc32e-2aa7-48c6-bbb9-f5aa4b31b442.avro
-rw-r--r-- 2 root supergroup 4650 2022-08-05 15:47 hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/snap-4956000582044241255-1-4fa34632-810d-48e7-a470-d5f9c6735f7d.avro
After deleting the snapshot, the data is intact
spark-sql (default)> select * from ods_base.IcebergTest7_XXZH;
data dt
1 20220801
2 20220802
4 20220804
7 20220807
6 20220806
3 20220803
5 20220805
22 20220802
Time taken: 1.649 seconds, Fetched 8 row(s)
3. Metadata characteristics (view snap content)
3.1 The content of snap, including 7 m0 files
[root@hadoop103 snap]# java -jar /opt/software/avro-tools-1.11.0.jar tojson --pretty snap-3511522569100173178-1-59ecc32e-2aa7-48c6-bbb9-f5aa4b31b442.avro
22/08/05 15:58:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro",
"manifest_length" : 6753,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1695,
"min_sequence_number" : 1695,
"added_snapshot_id" : 2845024222990467689,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220807"
},
"upper_bound" : {
"bytes" : "20220807"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro",
"manifest_length" : 6755,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1693,
"min_sequence_number" : 1693,
"added_snapshot_id" : 3476529947294323623,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220806"
},
"upper_bound" : {
"bytes" : "20220806"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro",
"manifest_length" : 6756,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1689,
"min_sequence_number" : 1689,
"added_snapshot_id" : 5462232017147497616,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220805"
},
"upper_bound" : {
"bytes" : "20220805"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro",
"manifest_length" : 6755,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1688,
"min_sequence_number" : 1688,
"added_snapshot_id" : 3246455649213713509,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220804"
},
"upper_bound" : {
"bytes" : "20220804"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro",
"manifest_length" : 6754,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1687,
"min_sequence_number" : 1687,
"added_snapshot_id" : 4917712002051492927,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220803"
},
"upper_bound" : {
"bytes" : "20220803"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro",
"manifest_length" : 6755,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1684,
"min_sequence_number" : 1684,
"added_snapshot_id" : 3096920793835932503,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220802"
},
"upper_bound" : {
"bytes" : "20220802"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro",
"manifest_length" : 6798,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 84,
"min_sequence_number" : 12,
"added_snapshot_id" : 8562765270417336551,
"added_data_files_count" : 0,
"existing_data_files_count" : 2,
"deleted_data_files_count" : 0,
"added_rows_count" : 0,
"existing_rows_count" : 2,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220801"
},
"upper_bound" : {
"bytes" : "20220802"
}
} ]
}
}
snap2 also contains 7 m0 files
[root@hadoop103 snap]# java -jar /opt/software/avro-tools-1.11.0.jar tojson --pretty snap-4956000582044241255-1-4fa34632-810d-48e7-a470-d5f9c6735f7d.avro 22/08/05 16:15:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{ "manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro",
"manifest_length" : 6753,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1695,
"min_sequence_number" : 1695,
"added_snapshot_id" : 2845024222990467689,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220807"
},
"upper_bound" : {
"bytes" : "20220807"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro",
"manifest_length" : 6755,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1693,
"min_sequence_number" : 1693,
"added_snapshot_id" : 3476529947294323623,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220806"
},
"upper_bound" : {
"bytes" : "20220806"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro",
"manifest_length" : 6756,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1689,
"min_sequence_number" : 1689,
"added_snapshot_id" : 5462232017147497616,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220805"
},
"upper_bound" : {
"bytes" : "20220805"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro",
"manifest_length" : 6755,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1688,
"min_sequence_number" : 1688,
"added_snapshot_id" : 3246455649213713509,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220804"
},
"upper_bound" : {
"bytes" : "20220804"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro",
"manifest_length" : 6754,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1687,
"min_sequence_number" : 1687,
"added_snapshot_id" : 4917712002051492927,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220803"
},
"upper_bound" : {
"bytes" : "20220803"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro",
"manifest_length" : 6755,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 1684,
"min_sequence_number" : 1684,
"added_snapshot_id" : 3096920793835932503,
"added_data_files_count" : 1,
"existing_data_files_count" : 0,
"deleted_data_files_count" : 0,
"added_rows_count" : 1,
"existing_rows_count" : 0,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220802"
},
"upper_bound" : {
"bytes" : "20220802"
}
} ]
}
}
{
"manifest_path" : "hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro",
"manifest_length" : 6798,
"partition_spec_id" : 0,
"content" : 0,
"sequence_number" : 84,
"min_sequence_number" : 12,
"added_snapshot_id" : 8562765270417336551,
"added_data_files_count" : 0,
"existing_data_files_count" : 2,
"deleted_data_files_count" : 0,
"added_rows_count" : 0,
"existing_rows_count" : 2,
"deleted_rows_count" : 0,
"partitions" : {
"array" : [ {
"contains_null" : false,
"contains_nan" : {
"boolean" : false
},
"lower_bound" : {
"bytes" : "20220801"
},
"upper_bound" : {
"bytes" : "20220802"
}
} ]
}
}
The third snap (generated automatically after one minute)
[root@hadoop103 snap]# java -jar /opt/software/avro-tools-1.11.0.jar tojson --pretty snap-5188266964869746455-1-abd5575f-12fc-43fd-bc9e-5cb69a0e03df.avro
22/08/05 16:19:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/ee7f5d4c-2c3f-4d87-9dab-a89b703dd2e1-m0.avro”,
“manifest_length” : 6753,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1695,
“min_sequence_number” : 1695,
“added_snapshot_id” : 2845024222990467689,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220807” }, “upper_bound” : {
“bytes” : “20220807”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/91501f41-1db7-4307-a2c0-c8041bb936eb-m0.avro”,
“manifest_length” : 6755,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1693,
“min_sequence_number” : 1693,
“added_snapshot_id” : 3476529947294323623,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220806”
},
“upper_bound” : {
“bytes” : “20220806”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/765c23a8-266b-4545-94c1-a0f446a5775e-m0.avro”,
“manifest_length” : 6756,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1689,
“min_sequence_number” : 1689,
“added_snapshot_id” : 5462232017147497616,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220805”
},
“upper_bound” : {
“bytes” : “20220805”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7b90e539-a1d9-4f47-83df-76fdb723de45-m0.avro”,
“manifest_length” : 6755,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1688,
“min_sequence_number” : 1688,
“added_snapshot_id” : 3246455649213713509,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220804”
},
“upper_bound” : {
“bytes” : “20220804”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/1df9c902-caf9-4703-8c22-1d6a9f7de154-m0.avro”,
“manifest_length” : 6754,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1687,
“min_sequence_number” : 1687,
“added_snapshot_id” : 4917712002051492927,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220803”
},
“upper_bound” : {
“bytes” : “20220803”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/7a047525-4ecd-46d9-a4ec-9b7321323cfc-m0.avro”,
“manifest_length” : 6755,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 1684,
“min_sequence_number” : 1684,
“added_snapshot_id” : 3096920793835932503,
“added_data_files_count” : 1,
“existing_data_files_count” : 0,
“deleted_data_files_count” : 0,
“added_rows_count” : 1,
“existing_rows_count” : 0,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220802”
},
“upper_bound” : {
“bytes” : “20220802”
}
} ]
}
}
{
“manifest_path” : “hdfs://ns/user/hive/warehouse/hive_iceberg_catalog/ods_base.db/IcebergTest7_XXZH/metadata/fe4c8846-b07c-42e4-98c2-68aed69fbfd0-m0.avro”,
“manifest_length” : 6798,
“partition_spec_id” : 0,
“content” : 0,
“sequence_number” : 84,
“min_sequence_number” : 12,
“added_snapshot_id” : 8562765270417336551,
“added_data_files_count” : 0,
“existing_data_files_count” : 2,
“deleted_data_files_count” : 0,
“added_rows_count” : 0,
“existing_rows_count” : 2,
“deleted_rows_count” : 0,
“partitions” : {
“array” : [ {
“contains_null” : false,
“contains_nan” : {
“boolean” : false
},
“lower_bound” : {
“bytes” : “20220801”
},
“upper_bound” : {
“bytes” : “20220802”
}
} ]
}
}
3.2 Summarize the pointing of the snap file
Snap points to multiple m0 files, and m0 points to data files, so as to obtain all file information.
4. After the m0 file is merged, does the subsequent m0 file start from the merged file?
Continue to the next lesson. . . .
Summarize
For example: the above is what we will talk about today. This article only briefly introduces the use of pandas, and pandas provides a large number of functions and methods that allow us to process data quickly and easily.