Logstash data is written to ES in order

Applicable scene

The business needs to execute multiple commands that operate ES, such as insert and then update, delete and then insert, etc., and the execution interval of multiple commands is very short. Since Logstash submits events in batches, Elasticsearch is processed by an asynchronous thread pool, and the transactional nature of data cannot be guaranteed in normal use.

Elasticsearch's optimistic locking mechanism

Elasticsearch has version control for each stored data, and every addition, deletion, or modification of a record will increase the version number of the record by 1. If during a certain operation, check the version number first, if the current version number is not the expected version number, the operation will be cancelled.

After deleting a record, the version information of the record will be saved for 1 minute by default.

Logstash configuration example

Add the dataVersion field to two operation requests with very close intervals. This field can be a timestamp, and the field values ​​in the two operations must be the same

input {
  kafka {
    bootstrap_servers => "192.168.x.x:9092,192.168.x.x:9092,192.168.x.x:9092"
    topics => "synclogs"
    group_id => "logstash-sync"
    consumer_threads => "1"
    max_partition_fetch_bytes=> "5242880"
    codec => "json"
  }
}

output {
  if [dataVersion] {
    if [action] and [action] == "delete" {
      elasticsearch {
        hosts => ["192.168.x.x:9200","192.168.x.x:9200"]
        index => "sync_test"
        document_type => "test"
        action => "delete"
        codec => "json"
        document_id => "%{id}"
        version => "%{dataVersion}"
        version_type => "external"
        retry_on_conflict => 4
      }
    } else {
      elasticsearch {
        hosts => ["192.168.x.x:9200","192.168.x.x:9200"]
        index => "sync_test"
        document_type => "test"
        action => "index"
        codec => "json"
        document_id => "%{id}"
        version => "%{dataVersion}"
        retry_on_conflict => 4
      }
    }
  }else {
    elasticsearch {
      hosts => ["192.168.x.x:9200","192.168.x.x:9200"]
      index => "sync_test"
      document_type => "test"
      action => "index"
      codec => "json"
      document_id => "%{id}"
    }
  }
}

Guess you like

Origin blog.csdn.net/yml_try/article/details/108648546