Elasticsearch article describes the Search API
1 SearchAPI Overview
Es realize the data stored in the analysis, endpoint is _search, as follows:
Query There are two main forms of
2 URISearch explain and demonstrate
Achieved by url query search parameters, the following common parameters:
- q specify the query statement, the syntax for the Query String Syntax
- df q default field queries without the specified field, if not specified, es queries all fields
- sort sort
- timeout specified timeout, no timeout default
- from, size used for pagination
- term 与 phrase
- alfred way equivalent to alfred or way
- "Alfred way" words in the query, the requirements have to sort
- Pan inquiry
- alfred equivalent to the term to match in all fields
- The specified field
- name:aflred
- Group set packet, a matching rule specified parentheses
- (quick OR brown)AND fox
- status: (active OR pending) title : (full text search)
to create an index, generate test documents
PUT my_index_search
{
"settings":
{
"number_of_shards": "5",
"number_of_replicas": "0"
}
}
POST my_index_search/doc/_bulk
{"index":{"_id": "1"}}
{"username": "alfred way","job": "java engineer","age": 18,"birth": "1990-01-02","isMarried":false}
{"index":{"_id": "2"}}
{"username": "alfred","job": "java senior and java specialist","age": 28,"birth": "1980-05-07","isMarried":true}
{"index":{"_id": "3"}}
{"username": "lee","job": "java and ruby engineer","age": 22,"birth": "1985-08-07","isMarried":false}
{"index":{"_id": "4"}}
{"username": "alfred junior way","job": "ruby engineer","age": 23,"birth": "1989-08-02","isMarried":false}
# 查询所有字段中有alfred的文档
GET my_index_search/_search?q=alfred
# 设置profile可以看具体的查询语句
GET my_index_search/_search?q=alfred
{
"profile": true
}
GET my_index_search/_search?q=username:alfred
GET my_index_search/_search?q=username:alfred
{
"profile": true
}
# username:alfred和way是OR的关系
GET my_index_search/_search?q=username:alfred way
{
"profile": true
}
# PhraseQuery词语的查询
GET my_index_search/_search?q=username:"alfred way"
{
"profile": true
}
# "description": "username:alfred username:way" 下面描述
GET my_index_search/_search?q=username:(alfred way)
{
"profile": true
}
- Boolean operators
- AND(&&) OR(||) NOT(!)
- name:(tom NOT lee)
- Note uppercase, lowercase can not
- + - and respectively must must_not
- name:(tom +lee -alfred) 或者 name:((lee && !alfred)||(tome && lee && !alfred))
- + In the url will be resolved to a space, before they can be used encode for% 2B
GET my_index_search/_search?q=username:alfred AND way
{
"profile": true
}
GET my_index_search/_search?q=username:(alfred AND way)
{
"profile": true
}
GET my_index_search/_search?q=username:(alfred NOT way)
{
"profile": true
}
GET my_index_search/_search?q=username:(alfred +way)
{
"profile": true
}
GET my_index_search/_search?q=username:(alfred %2Bway)
{
"profile": true
}
- Range queries, support and value date
- Writing interval, a closed interval [] {} open interval
- age: [1 TO 10] I为 1 <= age <= 10
- age: [1 TO 10} I为 1 <= age <10
- age: [1 TO] I为 age> = 1
- age: {* TO 10] I为 age <= 10
- Arithmetic sign writing
- age:>=1
- age:(>=1 && <=10)或者 age:(+>=1 +<=10)
- Writing interval, a closed interval [] {} open interval
GET my_index_search/_search?q=username:alfred age:>20
GET my_index_search/_search?q=username:alfred AND age:>20
GET my_index_search/_search?q=birth:(>1980 AND <1990)
- Wildcard queries
- ? On behalf of one character, * represents 0 or more characters
- name:t?m
- name:tom*
- name:t*m
- Low wildcard matching efficiency, and take up more memory, not recommended
- If no special needs, do not? / * On the front
GET my_index_search/_search?q=username:alf*
- Regular expression match
GET my_index_search/_search?q=username:/[a]?l.*/
- Fuzzy matching fuzzy query
- name:roam~1
- A matching difference roam character word, such as foam, roams etc.
- Approximation proximity search query
- “fox quick”~5
- To compare the difference in term units, such as "quick fox" "quick brown fox" will be matched
GET my_index_search/_search?q=username:alfed~1
GET my_index_search/_search?q=job:"java engineer"~2
3 QueryDSL Profile
The query is sent to es via http request body, contains the following main parameters:
- query syntax in line with Query DSL query
- from、size
- timeout
- sort
- …
- JSON-based query language defined, mainly includes the following two types:
- Query class field
- Such as term, math, range, etc., only a query against a field
- Match the query
- The bool query, comprising one or more fields or queries type compound queries
- Query class field
4 Introduction class field inquiries and match-query
-
Query class field include the following categories:
-
Full match
- Full-text search for text type of field, the query would first be word processing, such as match, match_phrase and other query types
-
Word Match
- Will not do word processing on the query, go directly to the inverted index matching fields, such as term, terms, range and other query types
-
On the field for full-text search, the most basic and common type of query, API examples are as follows:
GET my_index_search/_search
{
"query": {
"match": {
"username": "alfred way"
}
}
}
# 查看查询语句
GET my_index_search/_search
{
"profile": true,
"query": {
"match": {
"username": "alfred way"
}
}
}
- You can control the relationship between words by matching operator parameters, options or and and
GET my_index_search/_search
{
"profile": true,
"query": {
"match": {
"username": {
"query": "alfred way",
"operator": "and"
}
}
}
}
- Minmun_should_match parameters can be controlled by the need to match the number of words
GET my_index_search/_search
{
"profile": true,
"query": {
"match": {
"job": {
"query": "java ruby engineer",
"minimum_should_match": 2
}
}
}
}
Correlation calculation points 5
- Correlation count points refers to the degree of correlation between the document and the query, English as relevance
- You can get a list of documents that match the query by inverted index
- Is essentially a scheduling problem, ordering is based on the correlation count points
- Several important concepts of count points as follows:
- Term Frequency (TF) word frequency, that is, the number of occurrences of the word in the document. The higher the word frequency, the higher the degree of correlation
- Document Frequency (DF) document frequency, that word appears in the document tree
- Inverse Document Frequency (IDF) inverse document frequency, document frequency contrast, simply understood as 1 / DF. That is, the fewer the number of documents occurrences of the word, the more relevant
- Field-length Norm document shorter, more relevant
- ES At present, there are two related points of calculation models, such as:
- TF / IDF model
- The default model after model BM25 5.x
- TF / IDF model
6 match-phrase-query
- Field for retrieval, sequential requirements, as an example of the API
GET my_index_search/_search
{
"profile": true,
"query": {
"match_phrase": {
"job": {
"query": "java engineer"
}
}
}
}
GET my_index_search/_search
{
"profile": true,
"query": {
"match_phrase": {
"job": {
"query": "engineer java"
}
}
}
}
GET my_index_search/_search
{
"profile": true,
"query": {
"match_phrase": {
"job": {
"query": "java engineer",
"slop": 1
}
}
}
}
GET my_index_search/_search
{
"profile": true,
"query": {
"match_phrase": {
"job": {
"query": "java engineer",
"slop": 2
}
}
}
}
7 query-string-query
GET my_index_search/_search
{
"query": {
"query_string": {
"default_field": "username",
"query": "alfred AND way"
}
}
}
GET my_index_search/_search
{
"profile": true,
"query": {
"query_string": {
"fields": [
"username",
"job"
],
"query": "alfred OR (java AND ruby)"
}
}
}
8 simple-query-string-query
- Similarly Query String, but ignores the query syntax error, and supports only part of the query syntax
- As commonly used systems which can not use AND, OR, NOT, etc. Keywords:
- On behalf of that AND +
- | On behalf of that OR
- - on behalf of that NOT
# 必须包含away,可以包含alfred
GET my_index_search/_search
{
"profile": true,
"query": {
"simple_query_string": {
"query": "alfred +way",
"fields": ["username"]
}
}
}
GET my_index_search/_search
{
"profile": true,
"query": {
"simple_query_string": {
"query": "alfred +way AND java",
"fields": ["username"]
}
}
}
Comparison of query_string and simple_query_string
GET my_index_search/_search
{
"profile": true,
"query": {
"query_string": {
"fields": ["username"],
"query": "alfred OR (\"java AND ruby)"
}
}
}
GET my_index_search/_search
{
"profile": true,
"query": {
"simple_query_string": {
"query": "alfred +way AND \"java",
"fields": ["username"]
}
}
}
9 term/terms-query
- The term-query query query as a whole word, that word did not query processing, as follows:
- terms-query once passed more than one word query, as follows:
# term query
GET my_index_search/_search
{
"profile": true,
"query": {
"term": {
"username": "alfred"
}
}
}
GET my_index_search/_search
{
"profile": true,
"query": {
"term": {
"username": "alfred way"
}
}
}
# terms query
GET my_index_search/_search
{
"profile": true,
"query": {
"terms": {
"username": [
"alfred",
"way"
]
}
}
}
10 range-query
GET my_index_search/_search
{
"query": {
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
}
GET my_index_search/_search
{
"query": {
"range": {
"birth": {
"gte": "1980-01-01"
}
}
}
}
GET my_index_search/_search
{
"query": {
"range": {
"birth": {
"gte": "now-35y"
}
}
}
}