Flink 1.10 SQL 写ElasticSearch

The official website of the corresponding page: https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#elasticsearch-connector

Flink SQL ElasticSearch connector supports the stream mode, supports only do sink:

  Sink: Streaming Append Mode Sink: Streaming Upsert Mode Format: JSON-only

Note: Flink provided only these, it would be achieved

ElastincSearch connector upsert you can operate in mode with external systems in order to use the query definition key exchange UPSERT / DELETE message.

For append-only queries, connector may also operate in the append mode to INSERT message exchange with external systems only. If the query is not defined key, Elasticsearch automatically generate a key.

DDL is defined as follows:

CREATE TABLE MyUserTable (
  ...
) WITH (
  'connector.type' = 'elasticsearch', -- required: specify this table type is elasticsearch
  
  'connector.version' = '6',          -- required: valid connector versions are "6"
  
  'connector.hosts' = 'http://host_name:9092;http://host_name:9093',  -- required: one or more Elasticsearch hosts to connect to

  'connector.index' = 'MyUsers',       -- required: Elasticsearch index

  'connector.document-type' = 'user',  -- required: Elasticsearch document type

  'update-mode' = 'append',            -- optional: update mode when used as table sink.           

  'connector.key-delimiter' = '$',     -- optional: delimiter for composite keys ("_" by default)
                                       -- e.g., "$" would result in IDs "KEY1$KEY2$KEY3"

  'connector.key-null-literal' = 'n/a',  -- optional: representation for null fields in keys ("null" by default)

  'connector.failure-handler' = '...',   -- optional: failure handling strategy in case a request to 
                                         -- Elasticsearch fails ("fail" by default).
                                         -- valid strategies are 
                                         -- "fail" (throws an exception if a request fails and
                                         -- thus causes a job failure), 
                                         -- "ignore" (ignores failures and drops the request),
                                         -- "retry-rejected" (re-adds requests that have failed due 
                                         -- to queue capacity saturation), 
                                         -- or "custom" for failure handling with a
                                         -- ActionRequestFailureHandler subclass

  -- optional: configure how to buffer elements before sending them in bulk to the cluster for efficiency
  'connector.flush-on-checkpoint' = 'true',   -- optional: disables flushing on checkpoint (see notes below!)
                                              -- ("true" by default)
  'connector.bulk-flush.max-actions' = '42',  -- optional: maximum number of actions to buffer 
                                              -- for each bulk request
  'connector.bulk-flush.max-size' = '42 mb',  -- optional: maximum size of buffered actions in bytes
                                              -- per bulk request
                                              -- (only MB granularity is supported)
  'connector.bulk-flush.interval' = '60000',  -- optional: bulk flush interval (in milliseconds)
  'connector.bulk-flush.back-off.type' = '...',       -- optional: backoff strategy ("disabled" by default)
                                                      -- valid strategies are "disabled", "constant",
                                                      -- or "exponential"
  'connector.bulk-flush.back-off.max-retries' = '3',  -- optional: maximum number of retries
  'connector.bulk-flush.back-off.delay' = '30000',    -- optional: delay between each backoff attempt
                                                      -- (in milliseconds)

  -- optional: connection properties to be used during REST communication to Elasticsearch
  'connector.connection-max-retry-timeout' = '3',     -- optional: maximum timeout (in milliseconds)
                                                      -- between retries
  'connector.connection-path-prefix' = '/v1'          -- optional: prefix string to be added to every
                                                      -- REST communication
                                                      
  'format.type' = '...',   -- required: Elasticsearch connector requires to specify a format,
  ...                      -- currently only 'json' format is supported.
                           -- Please refer to Table Formats section for more details.
)

Flink automatically extract valid key from the query. For example, the query SELECT a, b, c FROM t GROUP BY a, b a and b define the fields of key combinations. Elasticsearch connector by using the keyword field delimiter series all keywords in the order defined in the query, generate a document ID for each row. You can define custom key field empty text representation.

DDL definitions official website provides, at least I have found that by adding the following parameters will be reported can not find a suitable TableSinkFactory

'connector.bulk-flush.back-off.max-retries' = '3',
'connector.bulk-flush.back-off.delay' = '10000'

Given as follows:

Exception in thread "main" org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.TableSinkFactory' in
the classpath.

Reason: No factory supports all properties.

The matching candidates:
org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7UpsertTableSinkFactory
Unsupported property keys:
connector.bulk-flush.back-off.max-retries
connector.bulk-flush.back-off.delay

Have to say this error, and the use of SQL often encounter this error, I have probably encountered two reasons:

  1, the corresponding packet is not added jar

  2, with the configuration error

Flink SQL based on the content of schame dll and the classpath, the need to use automatically inferred TableSinkFactory
 
If ddl wrong, or there is no corresponding TableSinkFactory classpath will be reported this wrong
 

Well, look at an example:

To add the corresponding dependency:

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-elasticsearch7_${scala.binary.version}</artifactId>
    <version>${flink.version}</version>
</dependency>    

SQL as follows:

-- 读 json,写csv
---sourceTable
CREATE TABLE user_log(
    user_id VARCHAR,
    item_id VARCHAR,
    category_id VARCHAR,
    behavior VARCHAR,
    ts TIMESTAMP(3)
) WITH (
    'connector.type' = 'kafka',
    'connector.version' = 'universal',
    'connector.topic' = 'user_behavior',
    'connector.properties.zookeeper.connect' = 'venn:2181',
    'connector.properties.bootstrap.servers' = 'venn:9092',
    'connector.startup-mode' = 'earliest-offset',
    'format.type' = 'json'
);

---sinkTable
CREATE TABLE user_log_sink (
    user_id VARCHAR,
    item_id VARCHAR,
    category_id VARCHAR,
    behavior VARCHAR,
    ts  VARCHAR
    --ts TIMESTAMP(3)
) WITH (
    'connector.type' = 'elasticsearch',
    'connector.version' = '7',
    'connector.hosts' = 'http://venn:9200',
    'connector.index' = 'user_behavior',
    'connector.document-type' = 'user',
    'connector.bulk-flush.interval' = '6000',
    'connector.connection-max-retry-timeout' = '3',
    'connector.bulk-flush.back-off.max-retries' = '3',
    'connector.bulk-flush.back-off.delay' = '10000',
    --'connector.connection-path-prefix' = '/v1',
    'update-mode' = 'upsert',
     'format.type' = 'json'
);
-- es sink is upsert, can update, use group key as es id ... 这段SQL 是乱写的。。
---insert
INSERT INTO user_log_sink
--SELECT user_id, item_id, category_id, behavior, ts
--FROM user_log;
SELECT
  cast(COUNT(*) as VARCHAR ) dt,
  cast(COUNT(*) as VARCHAR ) AS pv,
  cast(COUNT(DISTINCT user_id)as VARCHAR ) AS uv,
  MAX(behavior),
  DATE_FORMAT(ts, 'yyyy-MM-dd HH:mm:s0')
FROM user_log
GROUP BY DATE_FORMAT(ts, 'yyyy-MM-dd HH:mm:s0');

Look at the data written to the ES:

Get

Welcome rookie public attention Flink number will occasionally update Flink (technology development) related Tweets

 

Guess you like

Origin www.cnblogs.com/Springmoon-venn/p/12547260.html