[Elasticsearch] Get started quickly with Elasticsearch, just master these! (Official website entry case)


The main function of this article is to guide you from 0 to 1 to get started with the basic use of Elasticsearch, focusing on "data search" in Elasticsearch, that is _search

1 Introduction

Elasticsearch is a near real-time search platform. It provides a distributed full-text search engine and a REST API interface for user interaction. Elasticsearch is an open source project developed in Java language and based on the Apache protocol. It is currently the most popular enterprise search engine. Elasticsearch is widely used in cloud computing and can achieve real-time search and is stable, reliable and fast.

How to communicate with Elasticsearch, Elasticsearch provides a very comprehensive and powerful REST API that you can use

  • Check your cluster, node, and index health, status, and statistics
  • Administer your cluster, node, and index data and metadata
  • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
  • Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others

2. Related concepts

The official website explains the following concepts.

  • Near Realtime (near realtime): Elasticsearch is a near realtime search platform, which means that there is only a slight delay (usually a second) between indexed documents and searchable documents.
  • Cluster: A cluster is a collection of one or more nodes that together hold the entire data and provide federated indexing and search capabilities across all nodes. Each cluster has its own unique cluster name by which nodes join the cluster.
  • Node: A node refers to a single Elasticsearch instance that belongs to the cluster, stores data and participates in the indexing and search functions of the cluster.
  • Index (index) : Index is equivalent to **"a certain type of data"**. An index is a collection of documents with similar characteristics.
  • Document : Document is equivalent to "a piece of data" in Index. Documents are basic units of information that can be indexed, expressed in JSON format
  • Shards (shards): The concept of shards is similar to partitions in Kafka. The sharding mechanism endows the index with the ability to expand horizontally, improving performance and throughput.
  • Replicas: Replicas provide high availability in case some nodes fail.

3. Installation

Note:

  • Although the author wrote an article about installing Elasticsearch and Kibana with Docker, the post-analysis of the Docker method experience is very poor, and it is not suitable for Docker installation here, which does not bring us convenience, so the Docker installation method is not recommended here but the installation package method .

  • The version requirements of Elasticsearch and Kibana are consistent .

Elasticsearch is a near-real-time search platform that provides a REST API interface to interact with users, so in the following cases, it is enough to install only Elasticsearch. But for the sake of convenience, we choose to install an additional Elasticsearch visualization platform Kibana to operate the following cases. Take Elasticsearch6.6.2 as an example:

  • Elasticsearch download and installation
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.2.tar.gz
tar -xvf elasticsearch-6.6.2.tar.gz
cd elasticsearch-6.6.2
./bin/elasticsearch
  • Visit http://localhost:9200 with the browser to check whether Elasticsearch is installed successfully.

  • Kibana download and installation
curl -O https://artifacts.elastic.co/downloads/kibana/kibana-6.6.2-darwin-x86_64.tar.gz
tar -xzf kibana-6.6.2-darwin-x86_64.tar.gz
cd kibana-6.6.2-darwin-x86_64/
./bin/kibana
  • Visit http://localhost:5601 with the browser to check whether Kibana is installed successfully.

If you can click on the left menu normally, there will be no problem. zipkin is the index when I tested zipkin and can be ignored.

  • Using Kibana’s visual platform operation interface, all subsequent case operations are performed here.

4. View cluster status

  • View cluster health status;
GET /_cat/health?v
  • View node status;
GET /_cat/nodes?v
  • View all index information;
GET /_cat/indices?v

5. Index operations

  • Create index and view;
PUT /customer
GET /_cat/indices?v
  • Delete index and view;
DELETE /customer
GET /_cat/indices?v

6. Document operations

  • Add documents to the index;
PUT /customer/_doc/1
{
    
    
  "name": "John Doe"
}
  • View documents in the index;
GET /customer/_doc/1
  • Replace documents in index
PUT /customer/_doc/1?pretty
{
    
    
  "name": "John Doe"
}
  • Modify documents in the index:
POST /customer/_doc/1/_update?pretty
{
    
    
  "doc": {
    
     "name": "Jane Doe" }
}
POST /customer/_doc/1/_update?pretty
{
    
    
  "doc": {
    
     "name": "Jane Doe", "age": 20 }
}
POST /customer/_doc/1/_update?pretty
{
    
    
  "script" : "ctx._source.age += 5"
}

Modification is not the same as replacement

  • Delete documents from the index;
DELETE /customer/doc/1
  • Perform bulk operations on documents in the index
POST /customer/doc/_bulk
{
    
    "index":{
    
    "_id":"1"}}
{
    
    "name": "John Doe" }
{
    
    "index":{
    
    "_id":"2"}}
{
    
    "name": "Jane Doe" }

7. Data search

Query expression (Query DSL) is a very flexible and expressive query language. Elasticsearch can use it to implement rich search functions with a simple JSON interface. The following search operations will use it.

Data search is the focus of Elasticsearch.

data preparation

  • First we need to import a certain amount of data for search, using the example of a bank account table, the data structure schema is as follows:
{
    
    
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "[email protected]",
    "city": "Hobucken",
    "state": "CO"
}
  • Download the officially prepared data, data address: https://github.com/elastic/elasticsearch/blob/6.6/docs/src/test/resources/accounts.json. Alternate address: https://gitee.com/firefish985/article-list/blob/master/%E5%A4%A7%E6%95%B0%E6%8D%AE/Elasticsearch/accounts.json

  • Import data into Elasticsearch

It can be imported with the command in the current directory

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

It can also be imported in batches in Kibana's Dev Tools.

  • After the import is completed, check the index information and you can find that bank1,000 documents have been created in the index.
GET /_cat/indices?v

Getting Started with Search (match_all)

  • The simplest search is match_allrepresented by, for example, search all;
GET /bank/_search
{
    
    
  "query": {
    
     "match_all": {
    
    } }
}
  • Paging search, fromindicating the offset, starting from 0, sizeindicating the number displayed on each page;
GET /bank/_search
{
    
    
  "query": {
    
     "match_all": {
    
    } },
  "from": 0,
  "size": 10
}

  • Search sorting, using sortrepresentations such as balancedescending order by field;
GET /bank/_search
{
    
    
  "query": {
    
     "match_all": {
    
    } },
  "sort": {
    
     "balance": {
    
     "order": "desc" } }
}
  • Search and return the contents of the specified field, using _sourcerepresentation, for example, only return the contents of account_numberthe and balancetwo fields:
GET /bank/_search
{
    
    
  "query": {
    
     "match_all": {
    
    } },
  "_source": ["account_number", "balance"]
}

Conditional search (match)

  • Conditional search is used to matchexpress matching conditions, such as searching for account_numberdocuments 20:
GET /bank/_search
{
    
    
  "query": {
    
     "match": {
    
     "account_number": 20 } }
}
  • Conditional search for text type fields, such as the documents addresscontained in the search field mill, compared with the previous search, it can be found that for numerical type matchoperations, exact matching is used, and for text types, fuzzy matching is used;
GET /bank/_search
{
    
    
  "query": {
    
     "match": {
    
     "address": "mill" } },
  "_source": ["address", "account_number"]
}
  • Phrase match searches, using match_phraserepresentations such as documents addresscontained in the search fieldmill lane
GET /bank/_search
{
    
    
  "query": {
    
     "match_phrase": {
    
     "address": "mill lane" } }
}

Combination search (bool)

  • Combination search is used boolto perform combinations mustto indicate simultaneous satisfaction, such as documents containing both and addressin the search field ;milllane
GET /bank/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
     "match": {
    
     "address": "mill" } },
        {
    
     "match": {
    
     "address": "lane" } }
      ]
    }
  }
}
  • Combination search means documents shouldthat satisfy any one of them and contain or addressin the search field ;milllane
GET /bank/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "should": [
        {
    
     "match": {
    
     "address": "mill" } },
        {
    
     "match": {
    
     "address": "lane" } }
      ]
    }
  }
}
  • Combination search must_notmeans that it is not satisfied at the same time, such as documents that addressare not included in the search field milland are not included ;lane
GET /bank/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must_not": [
        {
    
     "match": {
    
     "address": "mill" } },
        {
    
     "match": {
    
     "address": "lane" } }
      ]
    }
  }
}
  • Combined search, combined mustsum must_not, for example, search for documents agewhere fields are equal 40and statefields are not included ;ID
GET /bank/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
     "match": {
    
     "age": "40" } }
      ],
      "must_not": [
        {
    
     "match": {
    
     "state": "ID" } }
      ]
    }
  }
}

Filter search(filter)

  • Search filtering is filterrepresented by, for example, filtering out documents balancewhose fields are in 20000~30000;
GET /bank/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": {
    
     "match_all": {
    
    } },
      "filter": {
    
    
        "range": {
    
    
          "balance": {
    
    
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

Search aggregation (aggs)

  • Aggregate the search results and use aggsto represent, similar to that in MySql group by, such as stateaggregating fields to count the same statenumber of documents;
GET /bank/_search
{
    
    
  "size": 0,
  "aggs": {
    
    
    "group_by_state": {
    
    
      "terms": {
    
    
        "field": "state.keyword"
      }
    }
  }
}

Similar to the SQL statement

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;

"size": 0Just aggregate the results

  • Nested aggregation, such as stateaggregating fields, counting the same statenumber of documents, and then counting balancethe average value;
GET /bank/_search
{
    
    
  "size": 0,
  "aggs": {
    
    
    "group_by_state": {
    
    
      "terms": {
    
    
        "field": "state.keyword"
      },
      "aggs": {
    
    
        "average_balance": {
    
    
          "avg": {
    
    
            "field": "balance"
          }
        }
      }
    }
  }
}
  • Sort the results of an aggregated search, for example, balancein descending order by the average value of
GET /bank/_search
{
    
    
  "size": 0,
  "aggs": {
    
    
    "group_by_state": {
    
    
      "terms": {
    
    
        "field": "state.keyword",
        "order": {
    
    
          "average_balance": "desc"
        }
      },
      "aggs": {
    
    
        "average_balance": {
    
    
          "avg": {
    
    
            "field": "balance"
          }
        }
      }
    }
  }
}
  • Segmentation aggregation is performed according to the range of field values, for example, if the segment range is agea field [20,30] [30,40] [40,50], then the average gendernumber and sum of statistical documents are used balance;
GET /bank/_search
{
    
    
  "size": 0,
  "aggs": {
    
    
    "group_by_age": {
    
    
      "range": {
    
    
        "field": "age",
        "ranges": [
          {
    
    
            "from": 20,
            "to": 30
          },
          {
    
    
            "from": 30,
            "to": 40
          },
          {
    
    
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
    
    
        "group_by_gender": {
    
    
          "terms": {
    
    
            "field": "gender.keyword"
          },
          "aggs": {
    
    
            "average_balance": {
    
    
              "avg": {
    
    
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

8. References

Official website entry case: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/getting-started.html

Guess you like

Origin blog.csdn.net/yuchangyuan5237/article/details/132114907