ES Restful API GET, POST, PUT, DELETE, HEAD meaning:
1) GET: Get the current status of the request object.
2) POST: Change the current state of the object.
3) PUT: Create an object.
4) DELETE: Destroy the object.
5) HEAD: Request to obtain the basic information of the object.
Mysql and Elasticsearch core concept comparison diagram
1. Insert
1.PUT specifies the Id insertion
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
2.POST auto-generated ID insertion
PUT /megacorp/employee
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
3. Bulk insert
curl -XPOST localhost:9200/_bulk --data-binary @data.json
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:03:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:04:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:05:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:06:00"}
{"index":{"_index":"meterdata","_type":"autoData"}}
{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:07:00"}
4.upsert insert
When the document exists, the script is executed; when the document does not exist, the content in the upsert will be inserted into the corresponding document
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "ctx._source.counter += count",
"params" : {
"count" : 4
}
},
"upsert" : {
"counter" : 1
}
}'
2. Update
Script can be used to update all documents, doc can be used to update some documents, and upsert can be used to add non-existing documents.
1. Update all
curl -XPUT localhost:9200/test/type1/1 -d '{
"counter" : 1,
"tags" : ["red"]
}'
2. Partial update
curl -XPOST "localhost:9200/gengxin/update/1/_update?pretty" -d ' { "doc": {"job": "奋斗者"} }'
3. Script update
(1). Update some fields
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "ctx._source.counter += count",
"params" : {
"count" : 4
}
}
}'
(2). New field
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.name_of_new_field = \"value_of_new_field\""
}'
(3). Remove fields
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.remove(\"name_of_field\")"
}'
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
"params" : {
"tag" : "blue"
}
}
}'
What's the difference between a full update and a partial update?
All update is to directly mark the previous old data as deleted, and then add an update.
Partial update, just modify a field.
refer to:
http://www.cnblogs.com/shihuc/p/5978078.html
http://www.cnblogs.com/xing901022/p/5330778.html
3. Delete
curl -XDELETE 'http://localhost:9200/twitter/tweet/1'
routing
If a route is provided when indexing, then when deleting, you also need to specify the corresponding route:
$ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?routing=kimchy'
In the above example, if you want to delete the index with id 1, you will find the document through a fixed route. If the routing is incorrect, the relevant documentation may not be found. In some cases, the _routing parameter needs to be used, but there is no value, then the delete request will be broadcast to each shard to perform the delete operation.
ES deletion summary
If the document exists, es will return a status code of 200 ok, the value of the found attribute is true, and the value of the _version attribute is +1.
If the document does not exist, es will return the status code of 404 Not Found, the value of the found attribute will be false, but the value of the _version attribute will still be +1. This is part of the internal management, which ensures that we are different between multiple nodes. The order of operations is correctly marked.
The delete operation of ES will not take effect immediately, similar to the update operation. It will just be marked as deleted, and ES will automatically delete it later.
For example, the operations you delete are accumulated step by step. When the upper limit is reached, after you delete dozens of pieces of data, ES I delete it at one time, which can save disk IO.
refer to:
http://www.cnblogs.com/zlslch/p/6421648.html
http://www.cnblogs.com/xing901022/archive/2016/03/26/5321659.html
4. Inquiry
1. query and filter
(1) Query context: The query operation will not only perform the query, but also calculate the score to determine the relevance;
(2) Filter context: The query operation only judges whether the query conditions are met, and the score will not be calculated. can be cached.
Reference: http://www.cnblogs.com/xing901022/p/4975931.html
Lightweight search, query string
GET /megacorp/employee/_search?q=last_name:Smith
2.Filter DSL
(1)term
Represents a complete match, that is, no tokenizer analysis is performed, and the document must contain the entire searched vocabulary (if it is Chinese, the default is to treat each word as an index, and only a single word can be searched)
POST /megacorp/employee/_search
{
"query": {
"term": {
"last_name": "Smith"
}
}
}
(2)terms filtering
terms are similar to terms, but terms allow multiple matching conditions to be specified. If a field specifies multiple values, the documents need to be matched together:
POST /megacorp/employee/_search
{
"query": {
"terms": {
"last_name": ["Bob","Smith"]
}
}
}
(3) range filtering
Allows us to find a batch of data by a specified range
POST /megacorp/employee/_search
{
"query": {
"range": {
"age": {
"gt": 18
}
}
}
}
(4) exists and missing filtering
Can be used to find whether a document contains a specified field or does not have a certain field, similar to the IS_NULL condition in a SQL statement.
POST /megacorp/employee/_search
{
"query": {
"exists": {
"field": "title"
}
}
}
(5) Bool Combining Queries
Use bool
filters to combine multiple filters for implementation and
, or
and not
logic. should satisfy a higher degree of matching. must
Statements all need to match, and all must_not
statements do not match. By default, none of the should
statements are required to match, with one exception: if there are no must
statements in the query, then at least one should
statement is required to match. minimum_should_match
Parameter to control should
the number of matches the statement needs to match, the parameter can be an absolute value or a percentage.
GET /my_index/my_type/_search { "query": { "bool": { "must": { "match": { "title": "quick" }}, "must_not": { "match": { "title": "lazy" }}, "should": [ { "match": { "title": "brown" }}, { "match": { "title": "dog" }} ] } } }
(6) filter (filter)
To achieve the effect of where in sql, for example: to search for an employee named Smith who is older than 30, you can retrieve it like this.
POST /megacorp/employee/_search
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
}
}
(7) Aggregations
It allows you to generate complex analytical statistics on the data, similar to group by in sql
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
Aggregations also allow hierarchical summaries. For example, let's count the average age of employees under each interest
GET /megacorp/employee/_search
{
"aggs" : {
"all_interests" : {
"terms" : { "field" : "interests" },
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
}
}
}
}
}
3.Query DSL
(1) match_all query
All documents can be queried, which is the default statement without query conditions.
POST /index/doc/_search
{
"query" : {
"match_all": {}
}
}
(2)match query
A standard query, basically use it whether you need full text query or exact query.
POST /index/doc/_search
{
"query" : {
"match" : {
"title" : "中国杭州"
}
}
}
match
The query accepts a operator
parameter whose default value is "or"
. It can be changed "and"
to require all terms to be matched to improve search accuracy.
POST /index/doc/_search { "query": { "match": { "title": { "query": "Hangzhou, China", "operator": "and" } } } }
Controlling Precision , in the example below with 3 entries, 75%
will be rounded down to 66.6%
2 out of 3 entries. No matter what you enter, the document will only be counted as a member of the final result if at least 2 terms match.
GET /index/doc/_search { "query": { "match": { "title": { "query": "Hangzhou, China", "minimum_should_match": "75%" } } } }
Score Calculation
bool
The query obtains a relevance score by adding up the matched sum- must
statements should
and _score
dividing by the total must
sum -statements . Statements do not affect the score; their only purpose is to exclude unwanted documents.should
_score
must_not
(3) multi_match query
Allows you to search multiple fields at the same time based on the match query, and look up one in multiple fields at the same time:
POST /index/doc/_search
{
"query" : {
"multi_match": {
"query": "中国",
"fields": [ "content", "title" ]
}
}
}
(4) match_phrase phrase search (phrases)
The difference between match_phrase and match is that the former will hit all the matched data of "rock" and "climbing" (ordered) , while the latter will hit rock balabala climbing, and the former can use the adjustment factor slop to control the number of mismatches.
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing",
"slop" : 1
}
}
}
(5) bool query
Similar to bool filtering, used to combine multiple query clauses. The difference is that the bool filter can directly give whether the match is successful, and the bool query needs to calculate the _score (relevance score) of each query clause.
must:: The query specifies that the document must be included.
must_not:: The query specifies that the document must not be included.
should:: Query the specified document, if there is, you can add points to the relevance of the document.
(6) wildcards query
Use standard shell wildcard queries
POST /index/doc/_search
{
"query": {
"wildcard": {
"content": "中*"
}
}
}
(7) regexp query
Using regexp queries allows you to write more complex patterns (Chinese can only match the beginning of a single word)
POST /index/doc/_search
{
"query": {
"regexp": {
"content": "中.*"
}
}
}
(8) prefix query
What character does it start with, it can be simpler to use prefix
POST /index/doc/_search
{
"query": {
"prefix": {
"content": "中"
}
}
}
Reference:
http://blog.csdn.net/dm_vincent/article/details/41720193
http://www.cnblogs.com/ghj1976/p/5293250.html
4.Mapping
what is mapping
The mapping of ES is very similar to the data type in the static language: declare a variable as a variable of type int, and then this variable can only store data of type int. Likewise, a mapping field of type number can only store data of type number.
Compared with the data type of the language, mapping has some other meanings. Mapping not only tells ES what type of value is in a field, but also tells ES how to index the data and whether the data can be searched.
Anatomy of a mapping
A mapping consists of one or more analyzers, and an analyzer consists of one or more filters. When ES indexes documents, it passes the contents of the fields to the corresponding analyzer, which in turn passes it to the respective filters.
The function of filter is easy to understand: a filter is a method that converts data, input a string, this method returns another string, such as a method that converts a string to lowercase is a good example of a filter.
An analyzer consists of a set of sequentially arranged filters. The process of performing the analysis is to call one filter by one in sequence, and ES stores and indexes the final result.
To sum up, the role of mapping is to execute a series of instructions to convert input data into searchable index items.
Default analyzer
Back to our example, ES guesses that the description field is of type string, so it creates a mapping of type string by default, which uses the default global analyzer. The default analyzer is the standard analyzer . This standard analyzer has three filters: token filter, lowercase filter and stop token filter.
(1) New
PUSH /libray/books
{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
},
"mappings" : {
"books" : {
"properties" : {
"name" : {
"type": "string",
"index": "not_analyzed"
},
"year" : {
"type" : "integer"
},
"detail" : {
"type" : "string"
}
}
}
}
}
(2) Delete all mappings in the index
DELETE /libray/_mapping
(3) Delete the specified mapping index
DELETE /libray/_mapping/books
refer to
http://m.blog.csdn.net/lilongsheng1125/article/details/53862629
5. Query Supplement
(1).source filter restricts the returned fields
_source
Retrieval is set to false parameter to turn off retrieval
GET /_search { "_source": "obj.*, obj2.*", "query" : { "match_all" : {} } }
complete control
GET /_search { "_source": { "includes": [ "obj1.*", "obj2.*" ], "excludes": [ "*.description" ] }, "query" : { "term" : { "user" : "kimchy" } } }
(2)sort sort
POST /bank/_search { "query": { "match_all" : {} }, "sort" : [ { "age" : "asc" } ] }
Category Mode OptionsEdit
Elasticsearch supports sorting by arrays or multivalued fields. This mode
option controls which array value is selected to sort the document it belongs to. This mode
option can have the following values:
|
Choose the lowest value. |
|
Choose the highest value. |
|
Use the sum of all values as the sort value. Applies only to number-based array fields. |
|
Use the average of all values as the sort value. Applies only to number-based array fields. |
|
Use the median of all values as the sort value. Applies only to number-based array fields. |
(3) Post Filter post filter
A filter for filtering search results and aggregations, the post_filter element is a top-level element that will only filter search results.
GET /cars/transactions/_search?search_type=count
{
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color" }
}
}
}
(4)explain
The score for each hit is explained.
GET /bank/_search
{
"explain" : true,
"query": {
"bool" : {
"filter" : {
"term" : {
"age" : 39
}
}
}
}
}
(5)version
Returns a version for each search hit.
GET /bank/_search
{
"version": true,
"query": {
"bool" : {
"filter" : {
"term" : {
"age" : 39
}
}
}
}
}
(6) min_score
Exclude _score
documents less than the minimum specified belowmin_score
GET /_search { "min_score": 0.5, "query" : { "term" : { "user" : "kimchy" } } }
(7)inner_hits
Returns the parent document, and also returns the child documents that match the has-child condition, which is equivalent to join between parent and child
Example: Suppose we use a parent document to store the email content, and a child document to store information about each email owner and the status of the email for this user. When searching the mailing list of an account, we want to search the mail content and mail status. It can be imagined that if there is no Inner-hits, we must score the query twice, because the mail content and mail status are stored in the parent document and the child document respectively. And with the Inner_hits property, we can do it with one query.
curl -XGET 'http://localhost:9200/hermes/email/_search/?pretty=true' -d '{
"query": {
"has_child": {
"type": "email_owner",
"query": {
"bool": {
"must": [
{ "term": { "owner": "[email protected]" } },
{"term": {"labelId": "1"} }
]}
},
//注意此处
"inner_hits": {}
}
}
}'
(8) mget batch query
If you want to query multiple pieces of data at one time, you must use the batch operation API to reduce the number of network overheads as much as possible, which may improve the performance several times or even dozens of times.
POST http://localhost:9200/bank/_mget
{
"docs" : [
{
"_type" : "accout",
"_id" : 1
},{
"_type" : "accout",
"_id" : 2
}]
}
5. Supplement
highly recommended:
Elasticsearch5.2 Core Knowledge http://www.jianshu.com/nb/13767185
Elasticsearch5.2 master advanced article http://www.jianshu.com/nb/14337815
tokenizer
The principle of es default tokenizer: Chinese is segmented by a single character, and English is segmented by spaces or punctuation.
match与term http://blog.csdn.net/yangwenbo214/article/details/54142786
Inverted index
Please refer to http://blog.csdn.net/wang_zhenwei/article/details/52831992
http://www.jianshu.com/p/ed7e1ebb2fb7
filters featurehttp ://www.cnblogs.com/bmaker/p/5480006.html
Filter query and aggregation http://blog.csdn.net/dm_vincent/article/details/42757519
_all http://blog.csdn.net/jiao_fuyou/article/details/49800969
Elasticsearch field data type:
http://www.jianshu.com/p/ab99d2bcd63d
http://blog.csdn.net/ntc10095/article/details/73730772 (recommended)
Introduction to the principles of ES: https://www.cnblogs.com/valor-xh/p/6095894.html