23. Logstash-input-jdbc synchronization principle and interpretation of related problems (ES and relational database synchronization)

Foreword:

Based on the characteristics of logstash-input-jdbc's stability, ease of use, version and ES synchronization update compared with other plug-ins, the following research is mainly carried out on logstash-input-jdbc. 
For several common difficult problems of logstash-input-jdbc, some problems have also been intensively discussed on git and stackoverflow. The following unified verification and answers are given.

1. What is the synchronization principle of logstash-input-jdbc?

(1), for full synchronization basis

Synchronize the SQL statements of the configuration file jdbc.sql.

(2), for incremental real-time synchronization basis

1) Set the timing strategy.

For example, the minimum update interval is updated every minute: schedule => "* * * * *", the current minimum update interval is 1 minute, and the verification found that the second-level update within 60s is not supported.

2) Set the sql statement.

For example, jdbc.sql, determines what content to synchronize and the conditions for synchronization updates.

{"id":10,"name":"10test","@version":"1","@timestamp":"2016-06-29T03:18:00.177Z","type":"132c_type"}
  • 1

2: Does logstash-input-jdbc only support time-based synchronization?

Verify table name: In addition to supporting synchronization based on time, synchronous update also supports synchronization based on changes in an auto-incrementing column (such as an auto-incrementing ID) field.

The last example is just an example of synchronizing time changes, setting the conditions:

[root@5b9dbaaa148a logstash_jdbc_test]# cat jdbc.sql_bak

select
        *
from
        cc
where   cc.modified_at > :sql_last_value
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

In fact, further research found that there is a use_column_value field in the configuration file to determine whether to record the value of a certain column. If record_last_run is true, you can customize the column name we need to track. At this time, this parameter must be true. Otherwise, the default track is the value of timestamp.

Example: The following is to set the id change as the synchronization condition.

[root@5b9dbaaa148a logstash_jdbc_test]# cat jdbc_xm.sql
select
        *
from
        cc
where   cc.id >= :sql_last_value
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

We can specify a file to record the value of the tracking_column field that was executed last time. For example, there were 12 records in the database last time. After the query is completed, there will be records such as the number 12 in the file. The next SQL query can be executed from 13 records. Start.

We only need to write WHERE MY_ID > :last_sql_value in the SQL statement. Among them: last_sql_value is the value in the file (12).

last_run_metadata_path => “/etc/logstash/run_metadata.d/my_info”

Such as:

[root@5b9 run_metadata.d]# cat /etc/logstash/run_metadata.d/my_info

--- 12
  • 1
  • 2
  • 3

The global code has been searched, and there is no trigger related processing operation.

3: MySQL and ES are stored on two servers respectively, and the time is inconsistent. Can synchronization be achieved?

(1). Setting For the incremental synchronization with time as the judgment condition, the synchronization can be performed with the set time as the reference point.

Verification found:

The displayed timestamp is the UTC time value on the ES (regardless of the time zone of the ES machine, it will be modified to UTC time and stored in the ES), and the displayed modified_at time value is the result of converting the synchronized MySQL time value to UTC.

The prerequisite for updating is that: cc.modified_at >= :sql_last_value. That is, if the time of mysql is modified to a time value less than sql_last_value, synchronization cannot be performed.

   如:
[elasticsearch@5b9dbaaa148a run_metadata.d]$ cat my_info

--- 2016-06-29 02:19:00.182000000 Z
  • 1
  • 2
  • 3

(2). For a column selected as a judgment condition (such as auto-increment ID), the time of the two (mysql and ES) is inconsistent, and it can actually be updated synchronously.

Verification found:

The time set in the test is the time value of mysql one day earlier or one day later than ES, which can realize the synchronous update operation.

4: How to support real-time synchronization of mysql delete operations to ES?

The logstash-input-jdbc plugin does not support synchronized updates for physical deletions. See:

http://stackoverflow.com/questions/35813923/sync-postgresql-data-with-elasticsearch/35823497#35823497

https://github.com/logstash-plugins/logstash-input-jdbc/issues/145

solution:

The synchronous delete operation is changed to the synchronous update update operation.

Step 1: Perform software removal, not physical removal.

The record is not physically deleted, but software is deleted, that is, a flag column is added to identify whether the record has been deleted (default is false, set to true or deleted means it has been deleted, a common method in the industry), in this way, through the existing Synchronization mechanism, the row data of the same tag record will be updated to Elasticsearch synchronously.

Step 2: Retrieve field information whose flag is true or deleted in ES.

In ES, a simple term query operation can be performed to retrieve deleted data information.

The third step: timed physical deletion.

Set a timed event, and physically delete records whose flag fields in mysql and ES are marked as true or deleted at intervals, that is, to complete the physical deletion operation.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324649404&siteId=291194637