Getting started with elasticsearch

ES Restful API GET, POST, PUT, DELETE, HEAD meaning:
1) GET: Get the current status of the request object.
2) POST: Change the current state of the object.
3) PUT: Create an object.
4) DELETE: Destroy the object.
5) HEAD: Request to obtain the basic information of the object.

Mysql and Elasticsearch core concept comparison diagram

write picture description here

1. Insert

1.PUT specifies the Id insertion

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

 2.POST auto-generated ID insertion

PUT /megacorp/employee
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

3. Bulk insert

curl -XPOST localhost:9200/_bulk --data-binary @data.json

{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:03:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:04:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:05:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:06:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:07:00"}

4.upsert insert

When the document exists, the script is executed; when the document does not exist, the content in the upsert will be inserted into the corresponding document

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : {
        "inline": "ctx._source.counter += count",
        "params" : {
            "count" : 4
        }
    },
    "upsert" : {
        "counter" : 1
    }
}'

2. Update

Script can be used to update all documents, doc can be used to update some documents, and upsert can be used to add non-existing documents.

1. Update all

curl -XPUT localhost:9200/test/type1/1 -d '{
    "counter" : 1,
    "tags" : ["red"]
}'

2. Partial update

curl -XPOST "localhost:9200/gengxin/update/1/_update?pretty" -d '
{
   "doc": {"job": "奋斗者"}
}'

3. Script update

(1). Update some fields

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : {
        "inline": "ctx._source.counter += count",
        "params" : {
            "count" : 4
        }
    }
}'

(2). New field

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.name_of_new_field = \"value_of_new_field\""
}'

(3). Remove fields

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.remove(\"name_of_field\")"
}'
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : {
        "inline": "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
        "params" : {
            "tag" : "blue"
        }
    }
}'

What's the difference between a full update and a partial update?

All update is to directly mark the previous old data as deleted, and then add an update.     

Partial update, just modify a field.

refer to:

http://www.cnblogs.com/shihuc/p/5978078.html

http://www.cnblogs.com/xing901022/p/5330778.html

3. Delete

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

routing

If a route is provided when indexing, then when deleting, you also need to specify the corresponding route:

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?routing=kimchy'

In the above example, if you want to delete the index with id 1, you will find the document through a fixed route. If the routing is incorrect, the relevant documentation may not be found. In some cases, the _routing parameter needs to be used, but there is no value, then the delete request will be broadcast to each shard to perform the delete operation.

ES deletion summary

       If the document exists, es will return a status code of 200 ok, the value of the found attribute is true, and the value of the _version attribute is +1.

  If the document does not exist, es will return the status code of 404 Not Found, the value of the found attribute will be false, but the value of the _version attribute will still be +1. This is part of the internal management, which ensures that we are different between multiple nodes. The order of operations is correctly marked.

The delete operation of ES will not take effect immediately, similar to the update operation. It will just be marked as deleted, and ES will automatically delete it later.

For example, the operations you delete are accumulated step by step. When the upper limit is reached, after you delete dozens of pieces of data, ES I delete it at one time, which can save disk IO.

refer to:

http://www.cnblogs.com/zlslch/p/6421648.html

http://www.cnblogs.com/xing901022/archive/2016/03/26/5321659.html

4. Inquiry

1. query and filter

(1) Query context: The query operation will not only perform the query, but also calculate the score to determine the relevance;
(2) Filter context: The query operation only judges whether the query conditions are met, and the score will not be calculated. can be cached.
Reference: http://www.cnblogs.com/xing901022/p/4975931.html

Lightweight search, query string

GET /megacorp/employee/_search?q=last_name:Smith

2.Filter DSL

(1)term 

Represents a complete match, that is, no tokenizer analysis is performed, and the document must contain the entire searched vocabulary (if it is Chinese, the default is to treat each word as an index, and only a single word can be searched)

POST /megacorp/employee/_search
{
  "query": {
    "term": {
      "last_name": "Smith"
    }
  }
}

(2)terms filtering 

terms are similar to terms, but terms allow multiple matching conditions to be specified. If a field specifies multiple values, the documents need to be matched together:

POST /megacorp/employee/_search
{
  "query": {
    "terms": {
      "last_name": ["Bob","Smith"]
    }
  }
}

(3) range filtering 

Allows us to find a batch of data by a specified range

POST /megacorp/employee/_search
{
  "query": {
    "range": {
      "age": {
        "gt": 18
      }
    }
  }
}

(4) exists and missing filtering 

Can be used to find whether a document contains a specified field or does not have a certain field, similar to the IS_NULL condition in a SQL statement.

POST /megacorp/employee/_search
{
  "query": {
    "exists":   {
        "field":    "title"
    }
  }
}

(5) Bool Combining Queries

Use boolfilters to combine multiple filters for implementation and, orand notlogic. should satisfy a higher degree of matching. mustStatements all need to match, and all must_notstatements do not match. By default, none of the shouldstatements are required to match, with one exception: if there are no muststatements in the query, then at least one shouldstatement is required to match. minimum_should_matchParameter to control shouldthe number of matches the statement needs to match, the parameter can be an absolute value or a percentage.

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }},
      "must_not": { "match": { "title": "lazy"  }},
      "should": [
                  { "match": { "title": "brown" }},
                  { "match": { "title": "dog"   }}
      ]
    }
  }
}

(6) filter (filter)

To achieve the effect of where in sql, for example: to search for an employee named Smith who is older than 30, you can retrieve it like this. 

POST /megacorp/employee/_search
{
  "query" : {
      "filtered" : {
          "filter" : {
              "range" : {
                  "age" : { "gt" : 30 } 
              }
          },
          "query" : {
              "match" : {
                  "last_name" : "Smith" 
              }
          }
      }
  }
}

(7) Aggregations

It allows you to generate complex analytical statistics on the data, similar to group by in sql

GET /megacorp/employee/_search
{
"aggs": {
  "all_interests": {
    "terms": { "field": "interests" }
  }
}
}

Aggregations also allow hierarchical summaries. For example, let's count the average age of employees under each interest 

GET /megacorp/employee/_search
{
  "aggs" : {
      "all_interests" : {
          "terms" : { "field" : "interests" },
          "aggs" : {
              "avg_age" : {
                  "avg" : { "field" : "age" }
              }
          }
      }
  }
}

3.Query DSL

(1) match_all query 

All documents can be queried, which is the default statement without query conditions.

POST /index/doc/_search
{
	"query" : {
		"match_all": {}
	}
}

(2)match query

A standard query, basically use it whether you need full text query or exact query.

POST /index/doc/_search
{
  "query" : {
      "match" : {
          "title" : "中国杭州"
      }
  }
}

matchThe query accepts a operatorparameter whose default value is "or". It can be changed "and"to require all terms to be matched to improve search accuracy.

POST /index/doc/_search
{
    "query": {
        "match": {
            "title": {      
                "query": "Hangzhou, China",
                "operator": "and"
            }
        }
    }
}

Controlling Precision , in the example below with 3 entries, 75%will be rounded down to 66.6%2 out of 3 entries. No matter what you enter, the document will only be counted as a member of the final result if at least 2 terms match.

GET /index/doc/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Hangzhou, China",
        "minimum_should_match":   "75%"
      }
    }
  }
}

Score Calculation

boolThe query obtains a relevance score by adding up the matched sum- muststatements shouldand _scoredividing by the total mustsum -statements . Statements do not affect the score; their only purpose is to exclude unwanted documents.should_scoremust_not

(3) multi_match query

Allows you to search multiple fields at the same time based on the match query, and look up one in multiple fields at the same time:

POST /index/doc/_search
{
  "query" : {
  	"multi_match": {
		"query":	"中国",
		"fields":	[ "content", "title" ]
	}
  }
}

(4) match_phrase phrase search (phrases)

The difference between match_phrase and match is that the former will hit all the matched data of "rock" and "climbing" (ordered) , while the latter will hit rock balabala climbing, and the former can use the adjustment factor slop to control the number of mismatches.

GET /megacorp/employee/_search
{
  "query" : {
      "match_phrase" : {
          "about" : "rock climbing",
          "slop" : 1
      }
  }
}

(5) bool query 

Similar to bool filtering, used to combine multiple query clauses. The difference is that the bool filter can directly give whether the match is successful, and the bool query needs to calculate the _score (relevance score) of each query clause.
    must:: The query specifies that the document must be included.
    must_not:: The query specifies that the document must not be included.
    should:: Query the specified document, if there is, you can add points to the relevance of the document.

(6) wildcards query 

Use standard shell wildcard queries

POST /index/doc/_search
{
  "query": {
    "wildcard": {
      "content": "中*"
    }
  }
}

(7) regexp query 

Using regexp queries allows you to write more complex patterns (Chinese can only match the beginning of a single word)

POST /index/doc/_search
{
  "query": {
    "regexp": {
      "content": "中.*"
    }
  }
}

(8) prefix query 

What character does it start with, it can be simpler to use prefix

POST /index/doc/_search
{
  "query": {
    "prefix": {
      "content": "中"
    }
  }
}

Reference:
http://blog.csdn.net/dm_vincent/article/details/41720193
http://www.cnblogs.com/ghj1976/p/5293250.html

4.Mapping

what is mapping

The mapping of ES is very similar to the data type in the static language: declare a variable as a variable of type int, and then this variable can only store data of type int. Likewise, a mapping field of type number can only store data of type number.

Compared with the data type of the language, mapping has some other meanings. Mapping not only tells ES what type of value is in a field, but also tells ES how to index the data and whether the data can be searched.

Anatomy of a mapping

A mapping consists of one or more analyzers, and an analyzer consists of one or more filters. When ES indexes documents, it passes the contents of the fields to the corresponding analyzer, which in turn passes it to the respective filters.

The function of filter is easy to understand: a filter is a method that converts data, input a string, this method returns another string, such as a method that converts a string to lowercase is a good example of a filter.

An analyzer consists of a set of sequentially arranged filters. The process of performing the analysis is to call one filter by one in sequence, and ES stores and indexes the final result.

To sum up, the role of mapping is to execute a series of instructions to convert input data into searchable index items.

Default analyzer

Back to our example, ES guesses that the description field is of type string, so it creates a mapping of type string by default, which uses the default global analyzer. The default analyzer is the standard analyzer . This standard analyzer has three filters: token filter, lowercase filter and stop token filter.

(1) New

PUSH  /libray/books
{
    "settings" : {
        "number_of_shards" : 2,
        "number_of_replicas" : 1
    },
    "mappings" : {
        "books" : {
            "properties" : {
                "name" : {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "year" : {
                    "type" : "integer"
                },
                "detail" : {
                    "type" : "string"
                }
            }
        }
    }
}

(2) Delete all mappings in the index

DELETE  /libray/_mapping

(3) Delete the specified mapping index

DELETE  /libray/_mapping/books

refer to 

http://m.blog.csdn.net/lilongsheng1125/article/details/53862629

5. Query Supplement

(1).source filter restricts the returned fields

_sourceRetrieval is set to false parameter to turn off retrieval

GET /_search
{
    "_source": "obj.*, obj2.*",
    "query" : {
        "match_all" : {}
    }
}

 complete control

GET /_search
{
    "_source": {
        "includes": [ "obj1.*", "obj2.*" ],
        "excludes": [ "*.description" ]
    },
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

(2)sort sort

POST /bank/_search
{
    "query": {
        "match_all" : {} 
    },
    "sort" : [
        {
            "age" : "asc"
        }
    ]
}

Category Mode OptionsEdit

Elasticsearch supports sorting by arrays or multivalued fields. This modeoption controls which array value is selected to sort the document it belongs to. This modeoption can have the following values:

min

Choose the lowest value.

max

Choose the highest value.

sum

Use the sum of all values ​​as the sort value. Applies only to number-based array fields.

avg

Use the average of all values ​​as the sort value. Applies only to number-based array fields.

median

Use the median of all values ​​as the sort value. Applies only to number-based array fields.

(3) Post Filter post filter

A filter for filtering search results and aggregations, the post_filter element is a top-level element that will only filter search results.

GET /cars/transactions/_search?search_type=count
{
    "query": {
        "match": {
            "make": "ford"
        }
    },
    "post_filter": {    
        "term" : {
            "color" : "green"
        }
    },
    "aggs" : {
        "all_colors": {
            "terms" : { "field" : "color" }
        }
    }
}

(4)explain

The score for each hit is explained.

GET /bank/_search
{
    "explain" : true,
    "query": {
        "bool" : {
            "filter" : {
                "term" : {
                    "age" : 39
                }
            }
        }
    }
}

(5)version

Returns a version for each search hit.

GET /bank/_search
{
    "version": true,
    "query": {
    	"bool" : {
    		"filter" : {
    			"term" : {
    				"age" : 39
    			}
    		}
    	}
    }
}

(6) min_score

Exclude _scoredocuments less than the minimum specified belowmin_score

GET /_search
{
    "min_score": 0.5,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

(7)inner_hits

Returns the parent document, and also returns the child documents that match the has-child condition, which is equivalent to join between parent and child

Example: Suppose we use a parent document to store the email content, and a child document to store information about each email owner and the status of the email for this user. When searching the mailing list of an account, we want to search the mail content and mail status. It can be imagined that if there is no Inner-hits, we must score the query twice, because the mail content and mail status are stored in the parent document and the child document respectively. And with the Inner_hits property, we can do it with one query.

curl -XGET  'http://localhost:9200/hermes/email/_search/?pretty=true' -d  '{
 "query": {
    "has_child": {
      "type": "email_owner",
      "query": {
        "bool": {
          "must": [
            { "term": { "owner": "[email protected]" } },
            {"term": {"labelId": "1"} }
          ]}
      },
      //注意此处
       "inner_hits": {} 
    }
  }
}'

(8) mget batch query

If you want to query multiple pieces of data at one time, you must use the batch operation API to reduce the number of network overheads as much as possible, which may improve the performance several times or even dozens of times.

POST  http://localhost:9200/bank/_mget
{
	"docs" : [
	{
		"_type" : "accout",
		"_id" : 1
	},{
		"_type" : "accout",
		"_id" : 2
	}]
}

5. Supplement

highly recommended:

Elasticsearch5.2 Core Knowledge  http://www.jianshu.com/nb/13767185

Elasticsearch5.2 master advanced article  http://www.jianshu.com/nb/14337815

tokenizer

The principle of es default tokenizer: Chinese is segmented by a single character, and English is segmented by spaces or punctuation.

match与term http://blog.csdn.net/yangwenbo214/article/details/54142786

Inverted index

Please refer to  http://blog.csdn.net/wang_zhenwei/article/details/52831992

http://www.jianshu.com/p/ed7e1ebb2fb7

http://www.infoq.com/cn/articles/database-timestamp-02?utm_source=infoq&utm_medium=related_content_link&utm_campaign=relatedContent_articles_clk

filters featurehttp   ://www.cnblogs.com/bmaker/p/5480006.html

Filter query and aggregation  http://blog.csdn.net/dm_vincent/article/details/42757519

_all http://blog.csdn.net/jiao_fuyou/article/details/49800969

Elasticsearch field data type:

http://www.jianshu.com/p/ab99d2bcd63d

http://blog.csdn.net/ntc10095/article/details/73730772 (recommended)

Introduction to the principles of ES: https://www.cnblogs.com/valor-xh/p/6095894.html

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325448470&siteId=291194637
Recommended