Elasticsearch of index data

For the full text retrieval tool, the index when a critical process - only to the data stored by indexing operation analysis, create an inverted index, which allows users to query relevant information.

Benpian on the ES data indexing operations expanded content related to:

More Reference: elasticsearch aggregated data

Index Operations

The simplest is to use the specified index index index operations, type type, ID (verbs need to distinguish between the index and the index ranking), reference to the following examples:

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'

Thus tweet twitter type in the index stored in the data id 1.

The results of operations for the index:

{
    "_shards" : {
        "total" : 10,
        "failed" : 0,
        "successful" : 10
    },
    "_index" : "twitter",
    "_type" : "tweet",
    "_id" : "1",
    "_version" : 1,
    "created" : true
}

_Shards above described fragmentation related information, i.e., a total of 10 current slice (primary slice 5, five sub-fragments, and are available); and index, type, id, version information associated .

Automatically create an index

If you perform the above operation before, ES twitter this index does not, then the default will create the index directly; and type fields will be automatically created. In other words, ES does not need to be like a traditional pre-defined database table structure .

Each type has an index mapping mapping, this mapping is generated dynamically , so that when adding new fields can be added automatically mapping settings.

By setting action.auto_create_index in the configuration file is false, you can turn off the automatic creation index this function.

Automatically create an index function, you can also set a blacklist or white list , such as:

Set action.auto_create_index to aaa * +, - * bbb, '+' sign means the beginning aaa allows you to create an index, '-' sign means not allowed to create an index bbb beginning .

About version number

The version number to maintain the status of a document, we will only operate for the highest version number of the document.

Document number can not be stored in a document, you can also maintain the version number on the outside, specifically refer to the official documentation of it ....

The type of operation op_type

ES parameters provided by op_type "That lack join" function, that is, if ES is not the document, it is indexed; if they have, then an error is returned.

If there is already a document id is 1, an error is reported, the direct use _create API, the same effect:

Automatically create ID:

According to the example for the top, ES will we specify id as a document ID. If you do not specify ID, then it will randomly assigned a:

Routing routing

ES is done by routing queries, usually a query will go through the following process:

A node receives a request, broadcast to each slice

Fragment 2 receives the request, calculation result is returned

3 merge message, returns

If we set the routing information, the equivalent of telling the ES, which go query data fragmentation, it canceled the broadcast of this consolidation process, thereby increasing the efficiency of the query. Instructions:

$ curl -XPOST 'http://localhost:9200/twitter/tweet?routing=kimchy' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'

Routing is achieved by a hash, if we directly routing the specified value at the time of the index, the hash value will be calculated according to this value, the slice allocation; if not specified, it will be allocated according to ID. Since the ID is randomly generated under normal circumstances, so as to ensure default fragment data payload is the same. If we need to store specific content in a particular slice, you can use the route specified slice. But to do so in the future with the increasing amount of data, it may also lead to a fragmentation excessive pressure.

It is also possible in the definition of mapping, when a correlation value is directly set of routing . Such data if the value of this type of routing is not specified, the default value will be used that route mapping defined.

Set parent child relationship

ES may be related to the affiliation of some documents, the use of parent parameter, you can set this relationship:

$ curl -XPUT localhost:9200/blogs/blog_tag/1122?parent=1111 -d '{
    "tag" : "something"
}'

_timestamp set timestamp

Timestamp field may be specified during index operations:

$ curl -XPUT localhost:9200/twitter/tweet/1?timestamp=2009-11-15T14%3A12%3A12 -d '{
    "user" : "kimchy",
    "message" : "trying out Elasticsearch"
}'

If you do not manually specify a timestamp, _source the time stamp does not exist, it will be set to the index specified time. But it needs _timestamp setting specifies the mapping is to enable

PUT my_index
{
  "mappings": {
    "my_type": {
      "_timestamp": { 
        "enabled": true
      }
    }
  }
}

ttl expired documents

ES can also set document automatically expire, expired is to set a regular time interval, and then _timestamp basis, if time-out, it will be automatically deleted.

If set to a time stamp:

curl -XPUT 'http://localhost:9200/twitter/tweet/1?ttl=86400000' -d '{
    "user": "kimchy",
    "message": "Trying out elasticsearch, so far so good?"
}'

If set to date mathematical expression:

curl -XPUT 'http://localhost:9200/twitter/tweet/1?ttl=1d' -d '{
    "user": "kimchy",
    "message": "Trying out elasticsearch, so far so good?"
}'

You can also specify the JSON field:

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "_ttl": "1d",
    "user": "kimchy",
    "message": "Trying out elasticsearch, so far so good?"
}'

Manually refresh

Since the ES framework is not a real-time search index, so the data after the index operation, need to wait 1 seconds to search. Search here refers to the retrieval operation. If you are using get this API, it is a true real-time operation. The difference between them is, the search may also need to be analyzed and sorted by relevance score calculation and other operations.

For the operation of the index data, can immediately search, refresh operation may be performed manually . Just add back to API refresh = true.

This operation is only recommended for use in exceptional circumstances, so if a large number of operations, each operation is executed refresh, it is a very cost performance.

Timeout Timeout

Fragmentation is not readily available, when the backup operation for fragmentation, can not be indexed operation. Therefore slice available to wait before proceeding. At this time, there will be some waiting time, if the time is over and so is returned and an error is thrown, the wait time can be set through the timeout:

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?timeout=5m' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'

 

The above is relevant index operational knowledge, there are some advanced knowledge, such as fragmentation and the version number of detailed usage, due to the ES or thorough enough understanding, they do not do too much about the first, lest too many errors.

If any objections, please correct me.

Reproduced in: https: //my.oschina.net/u/204616/blog/545440

Guess you like

Origin blog.csdn.net/weixin_33922672/article/details/91989887