ElasticSearch Study Notes (7)--Rebuild Index

Rebuilding the index will not copy the settings of the source index. You should _reindexspecify the settings of the target index before execution, including mappings, number of shards, number of replicas, etc.

first example

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test-copy"
  }
}

_reindex takes a snapshot to rebuild the index. To handle version conflicts, you can specify the version_type attribute in the target index, including two options "internal" and "external". (==The role of these two options, I did not understand==)

By adding properties to the parameters of the target index op_typeand setting these properties to "create", _reindex will only create documents that do not exist in the target index. All existing documents will cause a version conflict, but do not affect the execution of _reindex. It can be set conflictsto "proceed", and only count the number of documents with conflicting versions. The difference between the two is as follows: Request parameters

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test-copy",
    "op_type": "create"
  }
}

The response result is as follows

{
  "took": 2,
  "timed_out": false,
  "total": 2,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 2,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "test-copy",
      "type": "doc",
      "id": "2",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[doc][2]: version conflict, document already exists (current version [1])",
        "index_uuid": "8b78uPjKRmuH_2cqSiPKIA",
        "shard": "2",
        "index": "test-copy"
      },
      "status": 409
    },
    {
      "index": "test-copy",
      "type": "doc",
      "id": "1",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[doc][1]: version conflict, document already exists (current version [1])",
        "index_uuid": "8b78uPjKRmuH_2cqSiPKIA",
        "shard": "3",
        "index": "test-copy"
      },
      "status": 409
    }
  ]
}

request parameters

POST _reindex
{
  "conflicts": "proceed", 
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test-copy",
    "op_type": "create"
  }
}

response result

{
  "took": 5,
  "timed_out": false,
  "total": 3,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 3,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

Multiple source indexes can be specified, such as "index": ["source_index_1", "source_index_2"]. The number of documents copied from the target index can be limited, query and sort can be used in the source index, and the _source field can be specified

POST _reindex
{
    "size":1,
    "source":{
        "index": "test",
        "sort": {
            "date": "desc"
        },
        "query": {
          "match": {
            "test": "data"
          }
        },
        "_source": ["field1", "field2"]
    },
    "dest":{...}
}

_reindex supports scripts to modify documents.

If the source document has a field named "flag" and you want to change it to "tag" in the target document, you can execute the following statement

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2"
  },
  "script": {
    "source": "ctx._source.tag = ctx._source.remove(\"flag\")"
  }
}

Rebuild index from remote elasticsearch

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

A whitelist of allowed remote servers can be configured in elasticsearch.yml:reindex.remote.whitelist: ["first-host:9200", "second-host:9200"]

Remote reconstruction will use a heap buffer with a maximum size of 100Mb. If the size of the documents in the source index is large, the number of each batch should be specified reasonably, that is, the size attribute mentioned earlier.

You can specify socket_timeoutand connect_timeout, if not specified, the default value of these two parameters is 30 seconds.

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "socket_timeout": "1m",
      "connect_timeout": "10s"
    },
    "index": "source"
  },
  "dest": {
    "index": "dest"
  }
}

For more functions, see the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/docs-reindex.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325819270&siteId=291194637