Data synchronization tool chunjun(flinkx)-1.12.7 using tips

Table of contents

This article aims to record the tips in the process of using chunjun, and record the places that do not match the description on the official website, so as to reduce the learning cost

1. It is recommended to use an online json editor when writing json:

2. Similar to MySQL<—>MySQL, which needs to write tasks with jdbcUrl, pay attention to the inconsistency of jdbcUrl types in reader and writer

 3. Take kafka—>mysql as an example to explain the mapping relationship between fields

4. kafka—>mysql, when kafka enters dirty data, mysql will write blank lines

5. Kafkareader, group-offsets mode cannot read the submitted offset data


This article aims to record the tips in the process of using chunjun, and record the places that do not match the description on the official website, so as to reduce the learning cost

The easiest and quickest way to get started is to familiarize yourself with the connector parameters in the official website documentation ( pure Jun (dtstack.github.io) )

1. It is recommended to use an online json editor when writing json:

Editor | JSON Crack https://jsoncrack.com/editor This tool can check whether the json format is correct and automatically standardize the json format, and can automatically standardize the json format, and can also generate a tree diagram to visually view the structure, avoiding tasks caused by json format problems Can not operate

The error caused by the json format problem is similar: Caused by: com.google.gson.stream.MalformedJsonException: Unterminated array at line 24 column 16 path $.job.content[0].reader.parameter.[1]

 It can be avoided after checking the json format

2. Similar to MySQL<—>MySQL, which needs to write tasks with jdbcUrl, pay attention to the inconsistency of jdbcUrl types in reader and writer

jdbcUrl in reader is Array

And the jdbcUrl in the writer is String

 

This is inconsistent with the description in the official website document

 If it is written according to the official website document, a format matching error will occur:

Caused by: java.lang.IllegalStateException: Expected STRING but was BEGIN_ARRAY at path $.jdbcUrl

 3. Take kafka—>mysql as an example to explain the mapping relationship between fields

There are two kinds of data in kafka topic:

{"id":"1","name":"a1","A1":"0.001","A2":"0.005","A3":"100","A4":"abadc","A5":"eqerd"}
{"id":"2","name":"a2","A1":"0.001","A2":"0.005","A3":"5","A4":"abadc","A5":"eqerd"}
{"id":"3","name":"a3","A1":"0.1","A2":"0.3","A3":"20","A4":"","A5":"qerda"}
{"id":"4","name":"a4","A1":"0.00070","A2":"12.2","A3":"10","A4":null,"A5":"weaef"}
{"id":"5","name":"a5","A1":"0.1","A2":"0.3","A3":"20","A4":"adfsa","A5":"qerda"}
{"id":"6","name":"a1","A1":null,"A2":null,"A3":"100","A4":"abadc","A5":"eqerd"}
{"id":"1","name":"a1","B1":"0.1","B2":"5","B3":"GKLGU"}
{"id":"2","name":"a2","B1":"1.425","B2":"10","B3":"HJFV"}
{"id":"3","name":"a3","B1":"54.12","B2":"4325","B3":"FDGAD"}
{"id":"4","name":"a4","B1":"10.0","B2":"1","B3":null}
{"id":"5","name":"a5","B1":null,"B2":"11","B3":"SDF"}
{"id":"6","name":"a7","B1":null,"B2":null,"B3":null}

The first type contains id, name, A1, A2, A3, A4, A5 fields

The second type contains id, name, B1, B2, B3 fields

Write target table fields id, name, A1, A2, A3, A4, A5, B1, B2, B3

① Experiment 1:

kafkareader:name、A1、A2、A3、A4、A5、B1、B2、B3

mysqlwriter:name、A1、A2、A3、A4、A5、B1、B2、B3

The result is written normally

② Experiment 2

kafkareader:name、A1、A2、A3、A4、A5、B1、B2、B3

mysqlwriter:name、A2、A1、A3、A5、A4、B1、B2、B3

In the result table, columns A1 and A2 are swapped, and columns A4 and A5 are swapped

③ Experiment 3

kafkareader:name、A2、A1、A3、A4、A5、B1、B2、B3

mysqlwriter:name、A1、A2、A3、A4、A5、B1、B2、B3

Columns A1 and A2 in the result table are swapped

④ Experiment 4 (forgot to take a screenshot)

kafkareader:name、a1、a2、a3、A4、A5、b1、B2、B3

mysqlwriter:name、A1、A2、A3、A4、A5、B1、B2、B3

The columns A1, A2, A3, and B1 in the result target table are null

⑤ Experiment 5 (forgot to take a screenshot)

kafkareader:name、A1、A2、A3、A4、A5、B1、B2、B3

mysqlwriter:name、a1、a2、A3、A4、A5、B1、B2、B3

As a result, the operation failed, and the a1 and a2 columns could not be found in mysql

in conclusion:

① When the task runs to the reader, chunjun will match the data in the topic according to the fields defined in the kafkareader, and the order of the fields is not limited by the order of kafka. If the field in the reader does not appear in the kafka topic, it will be assigned null.

② Assign values ​​in sequence when the fields correspond. When the reader is M, L, N, and the writer is l, m, n, M->l, L->m, N->n, the official website document description is not accurate, Therefore, as long as you understand the correspondence between the data in Kafka and the table fields in MySQL, you can quickly write json

③ When the task runs to the writer, chunjun will write the mysql table according to the fields defined in mysqlwriter, and the order of the fields is not limited by the order of mysql. If the fields in the writer do not appear in mysql, an error will be reported

④ The field correspondence can be one-to-many. For example, A1, A1, A1 in the reader, A1, A2, A3 in the writer, then A1, A2, and A3 columns in the mysql table will have A1 value attached

4. kafka—>mysql, when kafka enters dirty data, mysql will write blank lines

If a column in the target table is set to not null, the task will fail directly regardless of whether chunjun sets the errorLimit

Pay attention to the compliance of writing data to Kafka

5. Kafkareader, group-offsets mode cannot read the submitted offset data

There are five modes in chunjun kafkareader:

  • group-offsets: Start consumption from the offset that has been submitted by the consumption group specified in ZK/Kafka brokers
  • earliest-offset: start at the earliest offset (if possible)
  • latest-offset: start at the latest offset (if possible)
  • timestamp: start at the specified timestamp for each partition
  • specific-offsets: start at specified specific offsets for each partition

In the group_offsets mode, the data in the group has not been consumed, and the default offset is -915623761773L.

At this time, the actual measurement using this mode will skip the original data and go directly to the latest offset, which is equivalent to the latest-offset mode. The query principle finds that:

This part of the chunjun code inherits the flink kafka api. When it is set to the group_offsets mode, if the offset of the group does not exist or is invalid, the initial offset will be determined according to the attribute "auto.offset.reset". auto.offset.reset defaults to largest.

If you want to consume the original data, you can manually set "auto.offset.reset" to earliest according to the following method

The earliest parameter of auto.offset.reset

When there is a committed offset under each partition: start consumption from the offset

When there is no committed offset under each partition: consume from scratch

In this way, most scenarios can be realized

Guess you like

Origin blog.csdn.net/weixin_44382736/article/details/129622257