Article directory
The main function of this article is to guide you from 0 to 1 to get started with the basic use of Elasticsearch, focusing on "data search" in Elasticsearch, that is
_search
1 Introduction
Elasticsearch is a near real-time search platform. It provides a distributed full-text search engine and a REST API interface for user interaction. Elasticsearch is an open source project developed in Java language and based on the Apache protocol. It is currently the most popular enterprise search engine. Elasticsearch is widely used in cloud computing and can achieve real-time search and is stable, reliable and fast.
How to communicate with Elasticsearch, Elasticsearch provides a very comprehensive and powerful REST API that you can use
- Check your cluster, node, and index health, status, and statistics
- Administer your cluster, node, and index data and metadata
- Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
- Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others
2. Related concepts
The official website explains the following concepts.
- Near Realtime (near realtime): Elasticsearch is a near realtime search platform, which means that there is only a slight delay (usually a second) between indexed documents and searchable documents.
- Cluster: A cluster is a collection of one or more nodes that together hold the entire data and provide federated indexing and search capabilities across all nodes. Each cluster has its own unique cluster name by which nodes join the cluster.
- Node: A node refers to a single Elasticsearch instance that belongs to the cluster, stores data and participates in the indexing and search functions of the cluster.
- Index (index) : Index is equivalent to **"a certain type of data"**. An index is a collection of documents with similar characteristics.
- Document : Document is equivalent to "a piece of data" in Index. Documents are basic units of information that can be indexed, expressed in JSON format
- Shards (shards): The concept of shards is similar to partitions in Kafka. The sharding mechanism endows the index with the ability to expand horizontally, improving performance and throughput.
- Replicas: Replicas provide high availability in case some nodes fail.
3. Installation
Note:
Although the author wrote an article about installing Elasticsearch and Kibana with Docker, the post-analysis of the Docker method experience is very poor, and it is not suitable for Docker installation here, which does not bring us convenience, so the Docker installation method is not recommended here but the installation package method .
The version requirements of Elasticsearch and Kibana are consistent .
Elasticsearch is a near-real-time search platform that provides a REST API interface to interact with users, so in the following cases, it is enough to install only Elasticsearch. But for the sake of convenience, we choose to install an additional Elasticsearch visualization platform Kibana to operate the following cases. Take Elasticsearch6.6.2 as an example:
- Elasticsearch download and installation
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.2.tar.gz
tar -xvf elasticsearch-6.6.2.tar.gz
cd elasticsearch-6.6.2
./bin/elasticsearch
- Visit http://localhost:9200 with the browser to check whether Elasticsearch is installed successfully.
- Kibana download and installation
curl -O https://artifacts.elastic.co/downloads/kibana/kibana-6.6.2-darwin-x86_64.tar.gz
tar -xzf kibana-6.6.2-darwin-x86_64.tar.gz
cd kibana-6.6.2-darwin-x86_64/
./bin/kibana
- Visit http://localhost:5601 with the browser to check whether Kibana is installed successfully.
If you can click on the left menu normally, there will be no problem. zipkin is the index when I tested zipkin and can be ignored.
- Using Kibana’s visual platform operation interface, all subsequent case operations are performed here.
4. View cluster status
- View cluster health status;
GET /_cat/health?v
- View node status;
GET /_cat/nodes?v
- View all index information;
GET /_cat/indices?v
5. Index operations
- Create index and view;
PUT /customer
GET /_cat/indices?v
- Delete index and view;
DELETE /customer
GET /_cat/indices?v
6. Document operations
- Add documents to the index;
PUT /customer/_doc/1
{
"name": "John Doe"
}
- View documents in the index;
GET /customer/_doc/1
- Replace documents in index
PUT /customer/_doc/1?pretty
{
"name": "John Doe"
}
- Modify documents in the index:
POST /customer/_doc/1/_update?pretty
{
"doc": {
"name": "Jane Doe" }
}
POST /customer/_doc/1/_update?pretty
{
"doc": {
"name": "Jane Doe", "age": 20 }
}
POST /customer/_doc/1/_update?pretty
{
"script" : "ctx._source.age += 5"
}
Modification is not the same as replacement
- Delete documents from the index;
DELETE /customer/doc/1
- Perform bulk operations on documents in the index
POST /customer/doc/_bulk
{
"index":{
"_id":"1"}}
{
"name": "John Doe" }
{
"index":{
"_id":"2"}}
{
"name": "Jane Doe" }
7. Data search
Query expression (Query DSL) is a very flexible and expressive query language. Elasticsearch can use it to implement rich search functions with a simple JSON interface. The following search operations will use it.
Data search is the focus of Elasticsearch.
data preparation
- First we need to import a certain amount of data for search, using the example of a bank account table, the data structure schema is as follows:
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "[email protected]",
"city": "Hobucken",
"state": "CO"
}
- Download the officially prepared data, data address: https://github.com/elastic/elasticsearch/blob/6.6/docs/src/test/resources/accounts.json. Alternate address: https://gitee.com/firefish985/article-list/blob/master/%E5%A4%A7%E6%95%B0%E6%8D%AE/Elasticsearch/accounts.json
- Import data into Elasticsearch
It can be imported with the command in the current directory
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
It can also be imported in batches in Kibana's Dev Tools.
- After the import is completed, check the index information and you can find that
bank
1,000 documents have been created in the index.
GET /_cat/indices?v
Getting Started with Search (match_all)
- The simplest search is
match_all
represented by, for example, search all;
GET /bank/_search
{
"query": {
"match_all": {
} }
}
- Paging search,
from
indicating the offset, starting from 0,size
indicating the number displayed on each page;
GET /bank/_search
{
"query": {
"match_all": {
} },
"from": 0,
"size": 10
}
- Search sorting, using
sort
representations such asbalance
descending order by field;
GET /bank/_search
{
"query": {
"match_all": {
} },
"sort": {
"balance": {
"order": "desc" } }
}
- Search and return the contents of the specified field, using
_source
representation, for example, only return the contents ofaccount_number
the andbalance
two fields:
GET /bank/_search
{
"query": {
"match_all": {
} },
"_source": ["account_number", "balance"]
}
Conditional search (match)
- Conditional search is used to
match
express matching conditions, such as searching foraccount_number
documents20
:
GET /bank/_search
{
"query": {
"match": {
"account_number": 20 } }
}
- Conditional search for text type fields, such as the documents
address
contained in the search fieldmill
, compared with the previous search, it can be found that for numerical typematch
operations, exact matching is used, and for text types, fuzzy matching is used;
GET /bank/_search
{
"query": {
"match": {
"address": "mill" } },
"_source": ["address", "account_number"]
}
- Phrase match searches, using
match_phrase
representations such as documentsaddress
contained in the search fieldmill lane
GET /bank/_search
{
"query": {
"match_phrase": {
"address": "mill lane" } }
}
Combination search (bool)
- Combination search is used
bool
to perform combinationsmust
to indicate simultaneous satisfaction, such as documents containing both andaddress
in the search field ;mill
lane
GET /bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"address": "mill" } },
{
"match": {
"address": "lane" } }
]
}
}
}
- Combination search means documents
should
that satisfy any one of them and contain oraddress
in the search field ;mill
lane
GET /bank/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"address": "mill" } },
{
"match": {
"address": "lane" } }
]
}
}
}
- Combination search
must_not
means that it is not satisfied at the same time, such as documents thataddress
are not included in the search fieldmill
and are not included ;lane
GET /bank/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"address": "mill" } },
{
"match": {
"address": "lane" } }
]
}
}
}
- Combined search, combined
must
summust_not
, for example, search for documentsage
where fields are equal40
andstate
fields are not included ;ID
GET /bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"age": "40" } }
],
"must_not": [
{
"match": {
"state": "ID" } }
]
}
}
}
Filter search(filter)
- Search filtering is
filter
represented by, for example, filtering out documentsbalance
whose fields are in20000~30000
;
GET /bank/_search
{
"query": {
"bool": {
"must": {
"match_all": {
} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
Search aggregation (aggs)
- Aggregate the search results and use
aggs
to represent, similar to that in MySqlgroup by
, such asstate
aggregating fields to count the samestate
number of documents;
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
Similar to the SQL statement
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;
"size": 0
Just aggregate the results
- Nested aggregation, such as
state
aggregating fields, counting the samestate
number of documents, and then countingbalance
the average value;
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
- Sort the results of an aggregated search, for example,
balance
in descending order by the average value of
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
- Segmentation aggregation is performed according to the range of field values, for example, if the segment range is
age
a field[20,30]
[30,40]
[40,50]
, then the averagegender
number and sum of statistical documents are usedbalance
;
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}
8. References
Official website entry case: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/getting-started.html