Flink 1.10 SQL read Kafka

Recently because of the epidemic, stole a lazy good for a long time, now finally starting to see continued Flink of the SQL 

————————————————

Flink items on the computer already upgraded to 1.10, and also recently Tell me what network the new document, taking advantage of the weekend to experience the new version of SQL API (step on pit).

Direct from the previous SQL sample Flink cloud evil Gangster start (pom has a good finishing ahead).

Recall from simple, Kafka is received from the user behavior, in accordance with time of the packet, seeking PV and UV, and then output to the mysql.

Look at adding dependence:

    <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table</artifactId>
    <version>${flink.version}</version>
    <type>pom</type>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-java-bridge_2.11</artifactId>
    <version>${flink.version}</version>
</dependency>
<!-- or... -->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-scala-bridge_2.11</artifactId>
    <version>${flink.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-common</artifactId>
    <version>${flink.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-java</artifactId>
    <version>${flink.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-scala_${scala.binary.version}</artifactId>
    <version>${flink.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner-blink_2.11</artifactId>
    <version>${flink.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_2.11</artifactId>
    <version>${flink.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-jdbc_2.11</artifactId>
    <version>${flink.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-csv</artifactId>
    <version>${flink.version}</version>
</dependency>

table associated with these, several new dependency attention, such as: flink-jdbc_2.11-1.10.0.jar

Look corresponding sql file:

--sourceTable
CREATE TABLE user_log (
    user_id VARCHAR,
    item_id VARCHAR,
    category_id VARCHAR,
    behavior VARCHAR, ts TIMESTAMP(3)
   
) WITH (
    'connector.type' = 'kafka',
    'connector.version' = 'universal',
    'connector.topic' = 'user_behavior',
    'connector.startup-mode' = 'earliest-offset',
    'connector.properties.0.key' = 'zookeeper.connect',
    'connector.properties.0.value' = 'venn:2181',
    'connector.properties.1.key' = 'bootstrap.servers',
    ,
    'connector.properties.1.value' = 'friend: 9092''update-mode' = 'append',
    'format.type' = 'json',
    'format.derive-schema' = 'true'
);

--sinkTable
CREATE TABLE pvuv_sink (
    dt VARCHAR,
    pv BIGINT,
    uv BIGINT
) WITH (
    'connector.type' = 'jdbc',
    'connector.url' = 'jdbc:mysql://venn:3306/venn',
    'connector.table' = 'pvuv_sink',
    'connector.username' = 'root',
    'connector.password' = '123456',
    'connector.write.flush.max-rows' = '1'
);

--insert
INSERT INTO pvuv_sink(dt, pv, uv)
SELECT
  DATE_FORMAT(ts, 'yyyy-MM-dd HH:00') dt,
  COUNT(*) AS pv,
  COUNT(DISTINCT user_id) AS uv
FROM user_log
GROUP BY DATE_FORMAT(ts, 'yyyy-MM-dd HH:00');

carried out

The first problem encountered is: "Type TIMESTAMP (6) of table field 'ts' does not match with the physical type TIMESTAMP (3) of the 'ts' field of the TableSource return type"

The default look is TIMESTAMP TIMESTAMP (. 6), and the source TIMESTAMP ( "ts": "2017-11-26T01: 00: 01Z") does not match the data type directly to ts: TIMESTAMP (3), get.

If there is no other pit, can be executed directly, the data output to myql in the

 

 

 After starting from the connector sql, and look at the next kafak, middle Flink 1.10 SQL, kafka only supports csv, json and avro three types. (Try the next json and csv)

Sql two programs, including read and write json, csn.

Directly above the table sink sql modified to write kafak:

--sinkTable
CREATE TABLE user_log_sink (
    dt VARCHAR,
    pv BIGINT,
    uv BIGINT
) WITH (
    'connector.type' = 'kafka',
    'connector.version' = 'universal',
    'connector.topic' = 'user_behavior_sink',
    'connector.properties.zookeeper.connect' = 'venn:2181',
    'connector.properties.bootstrap.servers' = 'venn:9092',
    'update-mode' = 'append',
    'format.type' = 'json'
);

However, it can not be executed.

It reported the following error:

AppendStreamTableSink requires that Table has only insert changes.

WTF, the above 'update-mode' is clearly written ' the append

Then, I started a while, and there is nothing Birds of operation: Tell me what network documentation, sql modify the configuration. .

Spent a lot of time here --------------- -----------------

Until the end, whim, directly to the contents of the output source of it, without any conversion:

--insert
INSERT INTO user_log_sink(dt, pv, uv)
SELECT user_id, item_id, category_id, behavior, ts
FROM user_log;

sink part also will be modified:

--sinkTable
CREATE TABLE user_log_sink (
    user_id VARCHAR,
    item_id VARCHAR,
    category_id VARCHAR,
    behavior VARCHAR,
    ts TIMESTAMP(3)
) WITH (
    'connector.type' = 'kafka',
    'connector.version' = 'universal',
    'connector.topic' = 'user_behavior_sink_1',
    'connector.properties.zookeeper.connect' = 'venn:2181',
    'connector.properties.bootstrap.servers' = 'venn:9092',
    'update-mode' = 'append',
    'format.type' = 'json'
);

Like, well, a. .

Hey, the official website and other documentation, read, and you should know why (note: such as know to add)

Then began the final hole.

When writing csv, he met the last pit, previous versions, "flink-shaded-jackso" I have been using "2.7.9-3.0," but there was not CsvSchame, so there has been this error:

Caused by: java.lang.ClassNotFoundException: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema$Builder

The flink-shaded-jackso version replaced flink code version of "2.9.8-7.0" 

Basically, it is the smooth completion of the write json kafka connector and the csv.

And finally paste the full SQL:

--sourceTable
CREATE TABLE user_log(
    user_id VARCHAR,
    item_id VARCHAR,
    category_id VARCHAR,
    behavior VARCHAR,
    ts TIMESTAMP(3)
) WITH (
    'connector.type' = 'kafka',
    'connector.version' = 'universal',
    'connector.topic' = 'user_behavior',
    'connector.properties.zookeeper.connect' = 'venn:2181',
    'connector.properties.bootstrap.servers' = 'venn:9092',
    'connector.startup-mode' = 'earliest-offset',
    'format.type' = 'json'
#    'format.type' = 'csv'
);

--sinkTable
CREATE TABLE user_log_sink (
    user_id VARCHAR,
    item_id VARCHAR,
    category_id VARCHAR,
    behavior VARCHAR,
    ts TIMESTAMP(3)
) WITH (
    'connector.type' = 'kafka',
    'connector.version' = 'universal',
    'connector.topic' = 'user_behavior_sink',
    'connector.properties.zookeeper.connect' = 'venn:2181',
    'connector.properties.bootstrap.servers' = 'venn:9092',
    'update-mode' = 'append',
#    'format.type' = 'json'
     'format.type' = 'csv'
);

--insert
INSERT INTO user_log_sink(dt, pv, uv)
SELECT user_id, item_id, category_id, behavior, ts
FROM user_log;

SQL-related files uploaded to the github:  Flink-rookic   , depend in pom.xml also updated.

 

Long time no write, recently will simply try again the SQL kafak / mysql / hbase / es / file / hdfs and other connector, and then try something else of SQL

 

Guess you like

Origin www.cnblogs.com/Springmoon-venn/p/12498883.html