Using the official ClickHouse JDBC driver, there are countless pitfalls

foreword

I recently encountered an online problem with ClickHouse:
Code: 242, e.displayText() = DB::Exception: Table is in readonly mode(zookeeper path:/clickhouse/tables/02/xxx) (version 21.12.4.1) (official build)

I checked the cause of this problem on the Internet and said that due to Zookeeperexcessive pressure, the table became read-only, resulting in ClickHousefailure to insert data.

There are two specific reasons:

  1. The frequency of writing data is too high.
  2. The cluster nodes in Zookeeper hang up.

The reason for this problem in our project is the first: the frequency of writing data is too high.

But in the process of searching for information on the Internet, I found another problem: our project uses JDBC驱动Maven groupId ru.yandex.clickhouse, but ClickHouseit is not officially recommended.

So I decisively visited ClickHouse's official website, through which I visited ClickHouse's GitHub address: https://github.com/ClickHouse/clickhouse-jdbc.

It confirmed that the official website does not recommend using ru.yandex.clickhousethe driver:

it should be changed to the driver, and the above version com.clickhouseis recommended : So, the upgrade journey started in the next few days . I have stepped on a lot of pitfalls, and I will share them with you, hoping to help you.0.3.2

ClickHouseJDBC驱动

1. First upgrade

ClickHouseThe JDBC driver recommended on the official GitHub is 0.3.2the above version: So, I decisively replaced the file

in the project with the version .pom.xmlgroupIdcom.clickhouse0.3.2

Refresh maven, start the project locally, and it can run normally.

Then I tested the business functions locally, and I was able to read and write data from ClickHouse normally.

I can't help but think in my heart: This upgrade is too easy.

2. The second upgrade

Later, colleagues in the project team suggested switching to the latest version, saying that there are more new features and performance has been greatly improved.

When I heard 性能有很大提升these words, I decided to upgrade and try again.

So, the version was upgraded to 0.3.2-patch11.

Tested again locally, all business functions are normal.

Then deploy the project to the test environment.

3. Found the problem

The next day I received two sentryalarm emails, both of which were at the alarm level warn.

An exception is indicated in the first email: This driver is DEPRECATED. Please use [com.clickhouse.jdbc.ClickHouseDriver] instead.

It means that the driver of ru.yandex.clickhouse has been abandoned, please use com.clickhouse.jdbc.ClickHouseDriverthe driver.

An exception is indicated in the second email: Also everything in package [ru.yandex.clickhouse] will be removed starting from 0.4.0.

It means that ru.yandex.clickhouse will be removed.

com.clickhouseWhen I saw these two emails, I was a little confused. Isn’t it just the driver package used ? ru.yandex.clickhouseWhere did it come from?

So I searched for ru.yandex.clickhousekeywords globally, but found no records.

This confused me even more.

Next, I opened the clickhouse-jdbc-0.3.2-patch11-all.jar file, and saw an unexpected result: there

are two directories under the jar package: com.clickhouseand ru.yandex.clickhouse, that is to say, the new driver and Older drivers support both.

And ClickhouseDriverthere are two classes:

I have 100,000 reasons in my mind at this time: why not just ru.yandex.clickhousedelete the code of the package, but print some warnings in the log file?

This is really too pitiful.

That is to say, after upgrading the driver, the project still uses the code of the old driver, and I tested it lonely. . .

4. How to use the new driver?

Next, the OS in my heart is: since the ClickHouse official driver package supports both new and old drivers, there must be a switch to control whether to use the new JDBC driver or the old JDBC driver.

From the current point of view, if the switch is not adjusted, the official ClickHouse driver package uses the old JDBC driver by default.

Next, the most important question is to figure out: How to use the new driver?

Soon, I found out by configuring the following parameters:

spring.datasource.clickhouse.drive-class-name=com.clickhouse.jdbc.ClickHouseDriver

You can specify Springthe JDBC driver to use.

Sure enough, in application.propertiesthe file, in the configuration 数据源place, such a configuration is added, restart the project, and Spring uses the new ClickHouse JDBC driver.

The two warns in the email are not printed in the log.

At this time, I was secretly happy, and finally used the JDBC driver officially recommended by ClickHouse.

The project has been running normally, so quickly test whether the business functions are normal.

5. Two new problems arise

The result was immediately slapped in the face.

When testing the business scenario of batch insert data, two exceptions appeared in the system operation log:

Exception 1:
Code: 6. DB:Exception: Cannot prse string '2022-11-22 14:42:37.025' as DateTime:syntax error at position 19...
Judging from the prompted information, it indicates that the time 2022-11-22 14:42:37.025 cannot be converted into DateTimea type.

Abnormality 2:
Please consider to use one and only one values expression, for example: use 'values(?)' instead of 'values(?),(?).'Judging from the prompted information, it indicates that batch insert data is not supported.

I go. . .

There is a problem in upgrading the ClickHouse JDBC driver.

The latest official JDBC driver of ClickHouse does not support batch insert data, which is a more serious problem.

Quickly search for a solution.

6. Rollback version

Soon, a similar problem was found in the issues of clickhouse-jdbc, address: https://github.com/ClickHouse/clickhouse-jdbc/issues/1106.
The question is as follows:

Someone answered below:

There is no such warning when using the old version.

I suddenly woke up like a dream.

Don't be obsessed with the latest version, clickhouse-jdbc must find the most suitable version.

So, I checked the ClickHouse server version of dev, st and ga environments, and found that dev uses it 20.12.8.5, while st and ga use it 21.12.4.1.

In order to be compatible with the dev environment, the ClickHouse server version is subject to 20+, and then see what version clickhouse-jdbc can use.

It was quickly found in releases that clickhouse-jdbc can use 0.3.2, and the highest can only be 0.3.2-patch1. Because 0.3.2-patch2 and above, the ClickHouse server is required to be version 21+.

So I can only rollback the version of clickhouse-jdbc to: 0.3.2-patch1.

Sure enough, after the rollback version, the problem of not being able to insert in batches has been solved.

Next, there is a problem.

7. DateTime

Let's review the problem together:
Code: 6. DB:Exception: Cannot prse string '2022-11-22 14:42:37.025' as DateTime:syntax error at position 19...
From the information in the prompt, it indicates that the time 2022-11-22 14:42:37.025 cannot be converted into DateTimea type.

The time format of DateTime is: , this problem is caused yyyy-MM-dd HH:mm:ssby the inclusion of 2022-11-22 14:42:37.025 , which cannot be directly converted to 2022-11-22 14:42:37.毫秒

I checked the code and table structure. In the code, the time field in Entity is defined as Datethe type.

The time field defined in the table is DateTimea type.

The official ClickHouse driver cannot directly convert Date type time to DateTime type.

How to solve this problem?

Answer: It is not enough to modify the field type in the table, and it will be DateTimeconverted to DateTime64, DateTime64which is supported 毫秒.

I personally tested that using the DateTime64 type to receive the time of the Date type in Java can be parsed normally.

That table has three DateTime fields: create_time, edit_time, and time.

The field types of the first two fields can be easily modified successfully.

But when modifying the time field, an exception was reported:
Code: 524,e.displayText() = DB::Exception: Alter of key column time from type DateTime to type DateTime64(3) must be metadata-only (20.12.8.5)

Prompt that the field used as the key cannot be modified.

Why?

8. order by

This time I directly checked the table creation statement of that table:

show create table test;

It is found that in addition to the primary key and ordinary index, the table also adds a special order byindex.

For example: order by (code, time).

Seeing this, I quickly understood that the time field is the index field of order by, no wonder it is not allowed to be modified casually.

So, find a DBA to discuss countermeasures.

The DBA said that to modify the type of the index field of the table in ClickHouse, the only way is to rebuild the table and then synchronize the data.

Obviously this solution is too troublesome.

I was wondering if there is any other simpler solution?

9. date_time_input_format parameter

I'm thinking at this moment, isn't it just a problem of time conversion?

Wouldn't it solve the problem to let ClickHouse automatically convert a time format when saving data?

date_time_input_formatI found a parameter called: on the Internet .

This parameter allows choosing a parser for textual representations of dates and times.

Its possible values:

  • ‘best_effort’ — Enables extended parsing.

ClickHouse can parse basic YYYY-MM-DD HH:MM:SS format and all ISO 8601 date and time formats. For example, '2018-06-08T01:02:03.000Z'.

  • ‘basic’ — Use basic parser.

ClickHouse can only parse the basic YYYY-MM-DD HH:MM:SS format. For example, '2019-08-20 10:18:56'.

Default: 'basic'.

It turns out that this is the root cause of the time conversion failure. If we set date_time_input_formatthe value to best_effort, the problem will not be solved.

In order not to affect the overall situation, I want to adjust the value of date_time_input_format only for those three tables.

But when saving the settings, an error was reported.

It turns out that the date_time_input_format parameter is only allowed MergeTreeto be used on the storage engine, and the storage engine of our table is used ReplacingMergeTree.

Weighed. . .

I can only think of other ways.

10. parseDateTimeBestEffortOrNull

In the place of insert data, wouldn't it be OK to use a function to manually convert it?

Of course, it is also possible to change the Date type in Java’s Entity to String, but after reviewing the code, this change is a bit big and involves many places.

The smallest change is handled at the mapper layer, because there is at most one insert in a mapper.

And I reviewed all the ClickHouse tables, only 3 tables use the DateTime type, and the other tables are DateTime64 types.

At the beginning, I found the formatDateTime function in the official documentation of ClickHouse. After testing, I found that the function was not suitable.

Later, I found parseDateTimeBestEfforta series of functions and decided to use parseDateTimeBestEffortOrNullfunctions.

parseDateTimeBestEffortOrNull(#{item.time})Just use the transformation in the insert statement of mapper.xml .

After testing, it was found that the time conversion problem was solved.

Later, this exception appeared again in the select statement.

At first I thought it was caused by the toDate(time) function, but later I found that the time field was used as the query condition in the where condition of the select, which caused the problem.

At this time, the parseDateTimeBestEffortOrNull function is also used to solve the problem.

So far, the ClickHouse JDBC driver package has been upgraded without any other problems.

Special attention should be paid to: if there is a time type field in the newly created table or newly added field in the future, it must be defined as DateTime64 type.

In fact, in the process of using it, we ClickHousealso encountered many pitfalls. The problem at the beginning of the article is just one of them. There will be a special article to share with you later, so stay tuned.

One last word (please pay attention, don't prostitute me for nothing)

If this article is helpful or inspiring to you, please scan and send the QR code to pay attention. Your support is the biggest motivation for me to keep writing.
Ask for one-click triple link: like, forward, and watch.
Follow the official account: [Su San said technology], reply in the official account: interviews, code artifacts, development manuals, time management have awesome fan benefits, and reply: join the group, you can communicate and learn from many seniors of BAT manufacturers .

Guess you like

Origin blog.csdn.net/lisu061714112/article/details/128088578