Iceberg from entry to proficiency series ten: flink sql inserts data into Iceberg table, Batch mode and Streaming mode query data

1. INSERT INTO

CREATE TABLE `stu` (id int,name string, age int)
PARTITIONED BY (age)

insert into stu values(3,'杀sheng',16),(4,'鸣人',19)

2. INSERT OVERWRITE

Only supports Flink's Batch mode

SET execution.runtime-mode = batch;

INSERT OVERWRITE sample VALUES (1,'a');

INSERT OVERWRITE `hive_catalog`.`default`.`sample` PARTITION(data='a') SELECT 6;

3. UPSERT

Iceberg supports UPSERT based on primary key when writing data to v2 tables. There are two ways to enable upsert.

Specify when creating the table

CREATE TABLE `hive_catalog`.`test`.`sample5`(
`id` INT UNIQUE COMMENT 'unique id',
`data` STRING NOT NULL,
PRIMARY KEY(`id`) NOT ENFORCED
) with (
'format-version'='2',
'write.upsert.enabled'='true'
);

In UPSERT mode, if the table is partitioned, the partition field must be the primary key.

insert into sample5 values(1,'a');
insert into sample5 values(2,'b');
SET sql-client.execution.result-mode=tableau;
select * from sample5;
insert into sample5 values(2,'c');

4. Query Batch mode

Batch mode:

SET execution.runtime-mode = batch;
select * from sample;

5. Query the Streaming mode

Streaming mode:

SET execution.runtime-mode = streaming;
SET table.dynamic-table-options.enabled=true;
SET sql-client.execution.result-mode=tableau;

Read all records from the current snapshot, then read incremental data from that snapshot

SELECT * FROM sample /*+ OPTIONS('streaming'='true','monitor-interval'='1s')*/;

Read the incremental data after the specified snapshot id (not included)

SELECT * FROM sample /*+ OPTIONS('streaming'='true','monitor-interval'='1s','start-snapshot-id'='384023852058202')*/;

6. Read the Kafka stream and insert it into the iceberg table

Download flink-connector-kafka:

https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka/1.17.1

Create the iceberg table:

CREATE TABLE `hive_catalog`.`test`.`sample5`(
`id` INT UNIQUE COMMENT 'unique id',
`data` STRING NOT NULL,
PRIMARY KEY(`id`) NOT ENFORCED
)

Create a table corresponding to kafka topic:

create table default_catalog.default_database.kafka(
id int,
data string
) with(
'connector' = 'kafka',
'topic' = 'testKafkaTopic',
'properties.zookeeper.connect'='hadoop1:2101',
'properties.bootstrap.servers' = 'hadoop1:9092',
'format' = 'json',
'properties.group.id'='iceberg',
'scan.startup.mode'='earliest-offset'
);

Streaming read:

SET sql-client.execution.result-mode=tableau;
SET execution.runtime-mode = streaming;

insert data

insert into hive_catalog.test1.sample5 select * from default_catalog.default_database.kafka;

Query data

SELECT * FROM sample5 /*+ OPTIONS('streaming'='true','monitor-interval'='1s')*/;

When the topic has the latest data, it can continuously query the latest data.

Guess you like

Origin blog.csdn.net/zhengzaifeidelushang/article/details/131484428