The official website of the corresponding page: https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#elasticsearch-connector
Flink SQL ElasticSearch connector supports the stream mode, supports only do sink:
Sink: Streaming Append Mode Sink: Streaming Upsert Mode Format: JSON-only
Note: Flink provided only these, it would be achieved
ElastincSearch connector upsert you can operate in mode with external systems in order to use the query definition key exchange UPSERT / DELETE message.
For append-only queries, connector may also operate in the append mode to INSERT message exchange with external systems only. If the query is not defined key, Elasticsearch automatically generate a key.
DDL is defined as follows:
CREATE TABLE MyUserTable ( ... ) WITH ( 'connector.type' = 'elasticsearch', -- required: specify this table type is elasticsearch 'connector.version' = '6', -- required: valid connector versions are "6" 'connector.hosts' = 'http://host_name:9092;http://host_name:9093', -- required: one or more Elasticsearch hosts to connect to 'connector.index' = 'MyUsers', -- required: Elasticsearch index 'connector.document-type' = 'user', -- required: Elasticsearch document type 'update-mode' = 'append', -- optional: update mode when used as table sink. 'connector.key-delimiter' = '$', -- optional: delimiter for composite keys ("_" by default) -- e.g., "$" would result in IDs "KEY1$KEY2$KEY3" 'connector.key-null-literal' = 'n/a', -- optional: representation for null fields in keys ("null" by default) 'connector.failure-handler' = '...', -- optional: failure handling strategy in case a request to -- Elasticsearch fails ("fail" by default). -- valid strategies are -- "fail" (throws an exception if a request fails and -- thus causes a job failure), -- "ignore" (ignores failures and drops the request), -- "retry-rejected" (re-adds requests that have failed due -- to queue capacity saturation), -- or "custom" for failure handling with a -- ActionRequestFailureHandler subclass -- optional: configure how to buffer elements before sending them in bulk to the cluster for efficiency 'connector.flush-on-checkpoint' = 'true', -- optional: disables flushing on checkpoint (see notes below!) -- ("true" by default) 'connector.bulk-flush.max-actions' = '42', -- optional: maximum number of actions to buffer -- for each bulk request 'connector.bulk-flush.max-size' = '42 mb', -- optional: maximum size of buffered actions in bytes -- per bulk request -- (only MB granularity is supported) 'connector.bulk-flush.interval' = '60000', -- optional: bulk flush interval (in milliseconds) 'connector.bulk-flush.back-off.type' = '...', -- optional: backoff strategy ("disabled" by default) -- valid strategies are "disabled", "constant", -- or "exponential" 'connector.bulk-flush.back-off.max-retries' = '3', -- optional: maximum number of retries 'connector.bulk-flush.back-off.delay' = '30000', -- optional: delay between each backoff attempt -- (in milliseconds) -- optional: connection properties to be used during REST communication to Elasticsearch 'connector.connection-max-retry-timeout' = '3', -- optional: maximum timeout (in milliseconds) -- between retries 'connector.connection-path-prefix' = '/v1' -- optional: prefix string to be added to every -- REST communication 'format.type' = '...', -- required: Elasticsearch connector requires to specify a format, ... -- currently only 'json' format is supported. -- Please refer to Table Formats section for more details. )
Flink automatically extract valid key from the query. For example, the query SELECT a, b, c FROM t GROUP BY a, b a and b define the fields of key combinations. Elasticsearch connector by using the keyword field delimiter series all keywords in the order defined in the query, generate a document ID for each row. You can define custom key field empty text representation.
DDL definitions official website provides, at least I have found that by adding the following parameters will be reported can not find a suitable TableSinkFactory
'connector.bulk-flush.back-off.max-retries' = '3',
'connector.bulk-flush.back-off.delay' = '10000'
Given as follows:
Exception in thread "main" org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSinkFactory' in the classpath. Reason: No factory supports all properties. The matching candidates: org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7UpsertTableSinkFactory Unsupported property keys: connector.bulk-flush.back-off.max-retries connector.bulk-flush.back-off.delay
Have to say this error, and the use of SQL often encounter this error, I have probably encountered two reasons:
1, the corresponding packet is not added jar
2, with the configuration error
Well, look at an example:
To add the corresponding dependency:
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch7_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency>
SQL as follows:
-- 读 json,写csv ---sourceTable CREATE TABLE user_log( user_id VARCHAR, item_id VARCHAR, category_id VARCHAR, behavior VARCHAR, ts TIMESTAMP(3) ) WITH ( 'connector.type' = 'kafka', 'connector.version' = 'universal', 'connector.topic' = 'user_behavior', 'connector.properties.zookeeper.connect' = 'venn:2181', 'connector.properties.bootstrap.servers' = 'venn:9092', 'connector.startup-mode' = 'earliest-offset', 'format.type' = 'json' ); ---sinkTable CREATE TABLE user_log_sink ( user_id VARCHAR, item_id VARCHAR, category_id VARCHAR, behavior VARCHAR, ts VARCHAR --ts TIMESTAMP(3) ) WITH ( 'connector.type' = 'elasticsearch', 'connector.version' = '7', 'connector.hosts' = 'http://venn:9200', 'connector.index' = 'user_behavior', 'connector.document-type' = 'user', 'connector.bulk-flush.interval' = '6000', 'connector.connection-max-retry-timeout' = '3', 'connector.bulk-flush.back-off.max-retries' = '3', 'connector.bulk-flush.back-off.delay' = '10000', --'connector.connection-path-prefix' = '/v1', 'update-mode' = 'upsert', 'format.type' = 'json' ); -- es sink is upsert, can update, use group key as es id ... 这段SQL 是乱写的。。 ---insert INSERT INTO user_log_sink --SELECT user_id, item_id, category_id, behavior, ts --FROM user_log; SELECT cast(COUNT(*) as VARCHAR ) dt, cast(COUNT(*) as VARCHAR ) AS pv, cast(COUNT(DISTINCT user_id)as VARCHAR ) AS uv, MAX(behavior), DATE_FORMAT(ts, 'yyyy-MM-dd HH:mm:s0') FROM user_log GROUP BY DATE_FORMAT(ts, 'yyyy-MM-dd HH:mm:s0');
Look at the data written to the ES:
Get
Welcome rookie public attention Flink number will occasionally update Flink (technology development) related Tweets