Elasticsearch Getting Started Notes (1)

Environment build

  Elasticsearch is a search engine and one of the common search tools.

  Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. Kibana provides the ability to search, view, and interact with data stored in Elasticsearch indexes. Developers or operators can easily perform advanced data analysis and visualize data in various charts, tables and maps.

  Other visualizations include elasticsearch-head (lightweight, with a corresponding Chrome plug-in), which will not be introduced in detail in this article.

  The versions of Elasticsearch and Kibana use 7.17.0, and the environment is built using Docker. docker-compose.ymlThe files are as follows:

version: "3.1"
# 服务配置
services:
  elasticsearch:
    container_name: elasticsearch-7.17.0
    image: elasticsearch:7.17.0
    environment:
      - "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
      - "http.host=0.0.0.0"
      - "node.name=elastic01"
      - "cluster.name=cluster_elasticsearch"
      - "discovery.type=single-node"
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - ./es/plugins:/usr/share/elasticsearch/plugins
      - ./es/data:/usr/share/elasticsearch/data
    networks:
      - elastic_net

  kibana:
    container_name: kibana-7.17.0
    image: kibana:7.17.0
    ports:
      - "5601:5601"
    networks:
      - elastic_net

# 网络配置
networks:
  elastic_net:
    driver: bridge

basic command

  • Check whether ElasticSearch is started successfully:
curl http://IP:9200
  • Check whether the cluster is healthy
curl http://IP:9200/_cat/health?v
  • View all indexes of ElasticSearch
curl http://IP:9200/_cat/indices
  • View all indices of ElasticSearch or the number of documents of an index
curl http://IP:9200/_cat/count?v
curl http://IP:9200/_cat/count/some_index_name?v
  • View information about the plugins that are running on each node
curl http://IP:9200/_cat/plugins?v&s=component&h=name,component,version,description
  • View the word segmentation results of the ik plugin
curl -H 'Content-Type: application/json'  -XGET 'http://IP:9200/_analyze?pretty' -d '{"analyzer":"ik_max_word","text":"美国留给伊拉克的是个烂摊子吗"}'

index operation

  • View the mapping of an index
curl http://IP:9200/some_index_name/_mapping
  • View all data of an index
curl http://IP:9200/some_index_name/_search
  • Query by ID
curl -X GET http://IP:9200/索引名称/文档类型/ID
  • Retrieve all data of an index
curl http://IP:9200/索引名称/_search?pretty
curl -X POST http://IP:9200/索引名称/_search?pretty -d "{
    
    \"query\": {
    
    \"match_all\": {} }}"
  • Retrieve the first few data of an index (if no size is specified, the default is 10)
curl -XPOST IP:9200/索引名称/_search?pretty -d "{
    
    \"query\": {
    
    \"match_all\": {} }, \"size\" : 2}"
  • Retrieve the middle data of a certain index (such as the 11th-20th data)
curl -XPOST IP:9200/索引名称/_search?pretty -d "{
    
    \"query\": {
    
    \"match_all\": {} }, \"from\" : 10, \"size\" : 10}}"
  • Retrieve an index, only return the context field
curl -XPOST IP:9200/索引名称/_search?pretty -d "{
    
    \"query\": {
    
    \"match_all\": {} }, \"_source\": [\"context\"]}"
  • delete an index
curl -XDELETE 'IP:9200/index_name'

ES search

  1. If there are multiple search keywords, Elastic considers them to be in an OR relationship.
  2. If you want to perform an AND search for multiple keywords, you must use a Boolean query.
$ curl 'localhost:9200/索引名称/文档类型/_search'  -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "软件" } },
        { "match": { "content": "系统" } }
      ]
    }
  }
}'
  1. Complex search:

SQL statements:

select * from test_index where name='tom' or (hired =true and (personality ='good' and rude != true ))

DSL statement:

GET /test_index/_search
{
    "query": {
            "bool": {
                "must": { "match":{ "name": "tom" }},
                "should": [
                    { "match":{ "hired": true }},
                    { "bool": {
                        "must":{ "match": { "personality": "good" }},
                        "must_not": { "match": { "rude": true }}
                    }}
                ],
                "minimum_should_match": 1
            }
    }
}

ik tokenizer

  The ik word breaker is a Chinese word breaker plug-in for Elasticsearch, and it supports Chinese word breakers better. The ik version should be consistent with Elasticsearch.

  The download address of ik 7.17.0 is: https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.17.0 After downloading, rename it to ik and put it in the plugins folder of Elasticsearch Down.

  The command to use the ik tokenizer (Kibana environment):

POST _analyze
{
    
    
  "text": "戚发轫是哪里人",
  "analyzer": "ik_smart"
}

The output is:

{
    
    
  "tokens" : [
    {
    
    
      "token" : "戚",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
    
    
      "token" : "发轫",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
    
    
      "token" : "是",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
    
    
      "token" : "哪里人",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

  ik supports loading user dictionaries and stop words. ik provides a configuration file IKAnalyzer.cfg.xml (place it under the ik/config path), which can be used to configure your own extended user dictionary, stop word dictionary and remote extended user dictionary, and you can configure multiple.

  After configuring the extended user dictionary and the remote extended user dictionary, you need to restart the ES. If you update the user dictionary later, you need to restart the ES. After the remote extended user dictionary is configured, it supports hot updates and checks for updates every 60 seconds. Both extended dictionaries are added to the main dictionary of ik and take effect for all indexes.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">custom/mydict.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

  The user dictionary file path is: custom/mydict.dic, the stop word dictionary path is: custom/ext_stopword.dic, and put them under the ik/config/custom path.

  Add 'Qi Fashi' to the user dictionary file, add 'yes' to the stop word dictionary, and segment the original text:

POST _analyze
{
    
    
  "text": "戚发轫是哪里人",
  "analyzer": "ik_smart"
}

The output is as follows:

{
    
    
  "tokens" : [
    {
    
    
      "token" : "戚发轫",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
    
    
      "token" : "哪里人",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

  If 'analyzer' selects ik_smart, the text will be split at the coarsest granularity; if ik_max_word is selected, the text will be split at the finest granularity. The test is as follows:

POST _analyze
{
    
    
  "text": "戚发轫是哪里人",
  "analyzer": "ik_max_word"
}

The output is as follows:

{
    
    
  "tokens" : [
    {
    
    
      "token" : "戚发轫",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
    
    
      "token" : "发轫",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
    
    
      "token" : "哪里人",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
    
    
      "token" : "哪里",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
    
    
      "token" : "里人",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

Summarize

  This article mainly introduces some basic commands and usage of Elasticsearch. It is the first article of the author's Elasticsearch study notes, which will be continuously updated in the future.

  The code for this article has been placed on Github at: https://github.com/percent4/ES_Learning .

Guess you like

Origin blog.csdn.net/jclian91/article/details/131996556