Basic concepts and common commands of elasticsearch

Basic concepts and common commands of elasticsearch

The basic concept of es

Inverted index

Inverted index: The underlying sorting of es uses inverted index, and the key (_id) is found by value. es has two areas, the index area and the metadata area. The metadata stores complete document data, and the index area records The number of index occurrences and length of each field in order to calculate the index score, which is used to return sorted. es has established an inverted index for each field, so when querying the field term, you can know the id of the document and quickly find the corresponding document.

tokenizer

Analysis: Text analysis is the process of converting the full text into a series of words, also called word segmentation (analyzer). Analysis is achieved through the analyzer . Word segmentation is to divide the document into terms through the analyzer, and each term points to the document containing this term.

The composition of analyzer: character filter (character filter), tokenizer (tokenizers), token filter (token filter)

Order: character filter -> tokenizer -> token filter

ik tokenizer

ik_max_word: according to the most fine-grained word segmentation

ik_smart: word segmentation according to the coarsest granularity

POST /_analyze
{
  "analyzer": "ik_max_word", 
  "text": "中华人民共和国群众"
}
expansion words and stop words

Extended words: Some words are not keywords, but they also want to be used as keywords for retrieval by es. These words can be added to the extended dictionary

Stop words: some words are keywords, but you do not want to use these keywords to be retrieved for business scenarios, you can put these words into the stop dictionary

To define expansion words and disable dictionaries, you can modify the file IKAnalyzer.cfg.xml in the config directory of the IK tokenizer.

1. 修改文件 vim IKAnalyzer.cfg.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
    <properties>
      <comment>IK Analyzer 扩展配置</comment>
      <!--用户可以在这里配置自己的扩展字典 -->
      <entry key="ext_dict">ext_dict.dic</entry>
       <!--用户可以在这里配置自己的扩展停止词字典-->
      <entry key="ext_stopwords">ext_stopwords.dic</entry>
      <!--用户可以在这里配置远程扩展字典 -->
      <!-- <entry key="remote_ext_dict">words_location</entry> -->
      <!--用户可以在这里配置远程扩展停止词字典-->
      <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
    </properties>
2. vim ext_dict.dic 在ik分词器中目录下config目录创建ext_dict.dic文件,编码一定要为utf-8才能生效,然后在里面加入扩展词即可,
3. vim ext_stopwords.dic 在ik分词器中目录下config目录创建ext_stopwords.dic文件,编码一定要为utf-8才能生效,然后再里面加入停用词即可
4. 重启es生效

Basic operation of Es

view index
GET _cat/indices?v
create index
PUT /orders
{
  "settings": {
    "number_of_shards": 1
    , "number_of_replicas": 0
  }
}
delete index
DELETE /orders
Create an index, and create related mappings

Note: The mapping of the index cannot be modified and deleted, only the index can be deleted to recreate the mapping

PUT /products 
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "id":{
        "type": "long"
      },
      "name":{
        "type": "keyword"
      },
      "price":{
        "type": "double"
      },
      "create_time":{
        "type": "date"
      },
      "description":{
        "type": "text"
      }
    }
  }
}
View the mapping of an index
GET /products/_mapping
create document

Note: Using the specified _id method, the same id will increase, and the new one will overwrite the old document

#指定_id
POST /products/_doc/1 
{
  "id": 1,
  "name": "小苹果",
  "price": 5.66,
  "create_time": "2022-06-13",
  "description": "小苹果正好吃"
} 
delete document
  • Delete based on _id

    DELETE /products/_doc/1
    
update document
  • Use the put method, first delete and then add
PUT /products/_doc/1
{
  "name": "小苹果2"
}
  • Use the post method to specify a field to update
POST /products/_doc/1/_update
{
  "doc":{
    "id": 1,
    "name": "小苹果",
    "price": 5.66,
    "create_time": "2022-06-13",
    "description": "小苹果正好吃"
  }
}
view document
  • Query based on _id

    GET /products/_doc/1
    
Document batch operation

Batch operations cannot be formatted, and batch operations are not atomic, and an error will not affect other operations

POST /products/_doc/_bulk
{"index":{"_id":2}}
{"id":2,"name":"macbook","price":9999,"create_time":"2022-06-13","description":"macbook正好用"}
{"index":{"_id":3}}
{"id":3,"name":"phone","price":5999,"create_time":"2022-06-13","description":"phone"}
POST /products/_doc/_bulk
{"index":{"_id":4}}
{"id":4,"name":"猪猪","price":9999,"create_time":"2022-06-13","description":"猪猪真好吃呀"}
{"update":{"_id":3}}
{"doc":{"name":"phone13"}}
{"delete":{"_id":2}}

es advanced query (QUERY-DSL)

query all documents indexed
GET /products/_search
{
  "query":{
    "match_all": {}
  }
}
term query specified field

Notice:

1. Only the text type in es is word-segmented, and other types are not word-segmented

2. The standard tokenizer is used by default in es, which is word-by-word for Chinese and word-by-word for English

GET /products/_search
{
  "query": {
    "term": {
      "description": {
        "value": "good"
      }
    }
  }
}
range range query

range keyword: used to query documents within a specified range

GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 0,
        "lte": 1000
      }
    }
  }
}
prefix prefix query
GET /products/_search
{
  "query": {
    "prefix": {
      "name": {
        "value": "小"
      }
    }
  }
}
wildcard wildcard query

wildcard:? Used to match one arbitrary character, * used to match multiple arbitrary characters

GET /products/_search
{
  "query": {
    "wildcard": {
      "name": {
        "value": "小*"
      }
    }
  }
}
ids batch query_id
GET /products/_search
{
  "query": {
    "ids": {
      "values": [1,3]
    }
  }
}
fuzzy fuzzy query

Note: fuzzy fuzzy query maximum fuzzy error must be between 0-2

  • The length of the search keyword is 2 and ambiguity is not allowed
  • The length of the search keyword is 3-5 to allow a vague
  • The length of the search keyword is greater than 5, allowing a maximum of two blurs
GET /products/_search
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "小果果"
      }
    }
  }
}
bool Boolean query

must: Equivalent to && established at the same time

should: Equivalent to||Establish one

must_not: Equivalent to ! can't satisfy either

GET /products/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "ids": {
            "values": [1,3]
          }
        },
        {
          "term": {
            "name": {
              "value": "小狗"
            }
          }
        }
      ]
    }
  }
}
Multi_match multi-field query

Note: If the field type is word-segmented, query the field after the query condition is word-segmented

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "好吃",
      "fields": ["name","description"]
    }
  }
}
query_string default field word segmentation query
GET /products/_search
{
  "query": {
    "query_string": {
      "default_field": "description",
      "query": "好吃"
    }
  }
}
highlight highlight query
GET /products/_search
{
  "query": {
    "query_string": {
      "default_field": "description",
      "query": "好吃"
    }
  },
  "highlight": {
    "pre_tags": ["<span style='color:red;'>"],
    "post_tags": ["</span>"],
    "require_field_match": "false",
    "fields": {
      "*": {}
    }
  }
}
Specify the number of items, paginate query, sort and return specified fields

Size: 10 items are returned by default

from: from which item to return to: (current page number - 1) * size

sort: Sort the specified field

_source: is an array, returns the specified field in the array

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "from": 1, 
  "size": 2,
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ],
  "_source": ["name","price"]
}

es filter query (Filter Query)

The query operation of es is divided into two types: query and filter.

query: By default, the score of each document is calculated and then sorted according to the score

filter: Only matching documents will be filtered out, no score will be calculated, and documents can be cached

Therefore, from a performance point of view, filtering is faster than querying. Filtering is suitable for filtering data on a large scale, while query is suitable for matching data precisely. In general application, you should first use the filtering operation to filter the data, and then use the query to match the data.

Note: When executing query and filter, execute filter first, then execute query

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "filter": [
        {
          "exists": {
            "field": "name"
          }
        }
      ]
    }
  }
}

Aggregation query of es

Grouping (terms)
GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "price_group":{
      "terms": {
        "field": "price",
        "size": 10
      }
    }
  }
}
maximum value (max)
GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "price_max": {
      "max": {
        "field": "price"
      }
    }
  }
}

es cluster construction

Notice:

  • All node cluster names must be consistent: cluster.name
  • Each node must have a unique name: node.name
  • Open each node remote connection: network.host:0.0.0.0
  • Specify the use of IP addresses for cluster node communication: network.pulish_host;
  • Modify the web port tcp port http.port:transport.tcp.port
  • Specify the communication list of all nodes in the cluster
  • Allow the cluster to initialize the number of master nodes
  • The minimum number of nodes available in the cluster
  • Enable cross-domain access for each node

Guess you like

Origin blog.csdn.net/qq798867485/article/details/129999563