1.概述
转载:Flink自定义ClickHouseSink–数据写入ClickHouse
这个版本是flink1.11版本才有的相关功能。
遇到需要将Kafka数据写入ClickHouse的场景,本文将介绍如何使用Flink JDBC Connector将数据写入ClickHouse
Flink JDBC Connector
Flink JDBC源码:
/**
* Default JDBC dialects.
*/
public final class JdbcDialects {
private static final List<JdbcDialect> DIALECTS = Arrays.asList(
new DerbyDialect(),
new MySQLDialect(),
new PostgresDialect(),
new ClickHouseDialect()
);
包含三种Connector,但是不包含ClickHouse的连接方式
现在通过自定义实现ClickHouseSink
一、下载Flink源码,添加ClickHOuseDialect文件
以下是ClickHOuseDialect文件里面的代码
备注:因为Clickhouse不支持删除操作,所以这个文件内的getDeleteStatement、getUpdateStatement方法都默认调的getInsertIntoStatement方法,即插入操作,有需求的也可以把删除和更新操作都实现了(有一个思路是给数据做一个标志字段,插入是1,删除是-1,更新是先-1,在1)
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.connector.jdbc.dialect;
import org.apache.flink.connector.jdbc.internal.converter.ClickHouseRowConverter;
import org.apache.flink.connector.jdbc.internal.converter.JdbcRowConverter;
import org.apache.flink.table.types.logical.LogicalTypeRoot;
import org.apache.flink.table.types.logical.RowType;
import java.util.Arrays;
import java.util.List;
import java.util.Optional;
/**
* JDBC dialect for ClickHouse.
*/
public class ClickHouseDialect extends AbstractDialect {
private static final long serialVersionUID = 1L;
@Override
public String dialectName() {
return "ClickHouse";
}
@Override
public boolean canHandle(String url) {
return url.startsWith("jdbc:clickhouse:");
}
@Override
public JdbcRowConverter getRowConverter(RowType rowType) {
return new ClickHouseRowConverter(rowType);
}
@Override
public Optional<String> defaultDriverName() {
return Optional.of("ru.yandex.clickhouse.ClickHouseDriver");
}
@Override
public String quoteIdentifier(String identifier) {
return "`" + identifier + "`";
}
@Override
public Optional<String> getUpsertStatement(String tableName, String[] fieldNames, String[] uniqueKeyFields) {
return Optional.of(getInsertIntoStatement(tableName, fieldNames));
}
@Override
public String getRowExistsStatement(String tableName, String[] conditionFields) {
return null;
}
// @Override
// public String getInsertIntoStatement(String tableName, String[] fieldNames) {
//
// }
@Override
public String getUpdateStatement(String tableName, String[] fieldNames, String[] conditionFields) {
return getInsertIntoStatement(tableName, fieldNames);
}
@Override
public String getDeleteStatement(String tableName, String[] fieldNames) {
return getInsertIntoStatement(tableName, fieldNames);
}
@Override
public String getSelectFromStatement(String tableName, String[] selectFields, String[] conditionFields) {
return null;
}
@Override
public int maxDecimalPrecision() {
return 0;
}
@Override
public int minDecimalPrecision() {
return 0;
}
@Override
public int maxTimestampPrecision() {
return 0;
}
@Override
public int minTimestampPrecision() {
return 0;
}
@Override
public List<LogicalTypeRoot> unsupportedTypes() {
// The data types used in Mysql are list at:
// https://dev.mysql.com/doc/refman/8.0/en/data-types.html
// TODO: We can't convert BINARY data type to
// PrimitiveArrayTypeInfo.BYTE_PRIMITIVE_ARRAY_TYPE_INFO in LegacyTypeInfoDataTypeConverter.
return Arrays.asList(
LogicalTypeRoot.BINARY,
LogicalTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE,
LogicalTypeRoot.TIMESTAMP_WITH_TIME_ZONE,
LogicalTypeRoot.INTERVAL_YEAR_MONTH,
LogicalTypeRoot.INTERVAL_DAY_TIME,
LogicalTypeRoot.ARRAY,
LogicalTypeRoot.MULTISET,
LogicalTypeRoot.MAP,
LogicalTypeRoot.ROW,
LogicalTypeRoot.DISTINCT_TYPE,
LogicalTypeRoot.STRUCTURED_TYPE,
LogicalTypeRoot.NULL,
LogicalTypeRoot.RAW,
LogicalTypeRoot.SYMBOL,
LogicalTypeRoot.UNRESOLVED
);
}
}
完了把ClickHouseDialect方法加到JdbcDialects下
二、添加ClickHouseRowConverter
三、打包,上传
把写好的源码打包上传到flink安装目录的lib目录下,另外,clickhouse和kafka相关的包最好都提前下好,避免运行报错
四、测试
跑一个flink shell进行测试
bin/pyflink-shell.sh local
1
跑之前,需要在clickhouse里建好相应的库表
以下是代码:
把数据写入clickhouse
st_env.sql_update("""CREATE TABLE t(
`class_id` string,
`task_id` string,
`history_id` string,
`id` string
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'smallcourse__learn_record',
'connector.properties.zookeeper.connect' = 'localhost:2181',
'connector.properties.bootstrap.servers' = 'localhost:9092',
'connector.properties.group.id' = 't2',
'connector.startup-mode' = 'latest-offset',
'format.type' = 'json',
'update-mode' = 'append'
)""")
st_env.sql_update("""CREATE TABLE test_lin (
`class_id` string,
`task_id` string,
`history_id` string,
`id` string
) WITH (
'connector.type' = 'jdbc',
'connector.url' = 'jdbc:clickhouse://localhost:8123/demo',
'connector.table' = 'test_lin',
'connector.driver' = 'ru.yandex.clickhouse.ClickHouseDriver',
'connector.username' = '',
'connector.password' = '',
'connector.write.flush.max-rows' = '1'
)""")
d=st_env.sql_query("select class_id,task_id,history_id,id from t")
d.insert_into("test_lin")
st_env.execute("d")
最后:
去clickhouse执行sql
select count() from test_lin;
1
结果: