es consistency problem

Internal factors

The consistency of es mainly has two aspects:

  • The refresh problem caused by using the lucene indexing mechanism

  • Replica Consistency Problems with Sharding and Replication ( consistency:one、all、quorum)

external factors

For external factors, if the synchronization mechanism between db and es is used, the synchronization here will have a certain delay, and there may be inconsistencies due to abnormal conditions, such as transaction rollback.

refresh after update operation

org.springframework.data.elasticsearch.repository.support.AbstractElasticsearchRepository

@Override
    public <S extends T> S save(S entity) {
        Assert.notNull(entity, "Cannot save 'null' entity.");
        elasticsearchOperations.index(createIndexQuery(entity));
        elasticsearchOperations.refresh(entityInformation.getIndexName());
        return entity;
    }

    public <S extends T> List<S> save(List<S> entities) {
        Assert.notNull(entities, "Cannot insert 'null' as a List.");
        Assert.notEmpty(entities, "Cannot insert empty List.");
        List<IndexQuery> queries = new ArrayList<IndexQuery>();
        for (S s : entities) {
            queries.add(createIndexQuery(s));
        }
        elasticsearchOperations.bulkIndex(queries);
        elasticsearchOperations.refresh(entityInformation.getIndexName());
        return entities;
    }

Refresh to the filesystem cache as soon as there are changes so they can be searched.

Replica Consistency Problem

But there is another problem. Once there are multiple replications, it involves consistency.

  • If consistency is one, then the writing speed is fast, and the latest changes cannot be guaranteed to be read;

  • If it is a quorum, it is a relatively compromised version. When writing, W>N/2, that is, the number of nodes participating in the write operation, W, must exceed half of the number of replica nodes N. If it is a quorum strategy, to ensure consistency in reading, you must use read quorum, read W out of N replicas, and then arbitrate to obtain the latest data. Or specify to read from the primary.
    related classes

org/elasticsearch/action/WriteConsistencyLevel.java
org/elasticsearch/action/RealtimeRequest.java
  • realtime request
    es provides realtime request, which is read from the translog, which can be guaranteed to be the latest.

public class GetRequest extends SingleShardRequest<GetRequest> implements RealtimeRequest {
  //......
}

But note that get is up-to-date, but other methods such as retrieval are not ( 如果需要搜索出来也是最新的,需要refresh,这个会刷新该shard但不是整个index,因此如果read请求分发到repliac shard,那么可能读到的不是最新的数据,这个时候就需要指定preference=_primary).

  • The all strategy is a strongly consistent strategy

summary

If you want to ensure strong read consistency:

  • When write consistency is not all, you need to specify read from primary shard

  • When write consistency is all, and replication is in sync mode (default), no additional specification is required. If replication is in async mode, it needs to be read from the primary shard.

curl -XGET 192.168.99.100:9200/myindex/_settings

curl -XPUT '192.168.99.100:9200/myindex/_settings' -d '
{
    "index" : {
        "action.write_consistency" : "all"
    }
}'

But they all need to be manually refreshed when updating.

If it is an application that reads more and writes less (especially when there are not many replicas), you can specify the write consistency as all, so that the read performance of the replica shard can be well used to improve the read performance of es.

refer to

https://segmentfault.com/a/1190000005844120

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327069698&siteId=291194637