Elasticsearcsh7.9.2&Elasticsearcsh-Head5.0.0&IK tokenizer 7.9.2 detailed usage steps recommended collection (demo under Windows)

Elasticsearch is a Lucene-based search server. It provides a full-text search engine with distributed multi-user capabilities, based on a RESTful web interface. Elasticsearch is developed in the Java language and released as an open source under the terms of the Apache license. It is a popular enterprise search engine.

One, download and install ES

1. Download ES

// 自带JDK的版本
https://www.elastic.co/cn/downloads/elasticsearch
// 不带JDK的版本
https://www.elastic.co/cn/downloads/elasticsearch-no-jdk

In this example, the version without JDK is used. We use our own JDK to prevent conflicts with the local JDK and the built-in JDK. Elasticsearch does not need to be installed and is ready to use after decompression.

note:

1. Prerequisites for installing Elasticsearch: JDK1.8 and above
2. The installation directory of ES, that is, the decompression directory cannot contain spaces, otherwise ES will not start afterwards

2. ES file directory

ES file directory
This is the ES version directory that comes with JDK, and there is no jdk directory for ES without JDK.

3. Solve the cross-domain problem of ES

Edit ES configuration file
Solve the cross-domain problem of ES
Configure cross-domain

http.cors.enabled: true
http.cors.allow-origin: "*"

4. Start ES

Start ES
The command line will pop up here
Rely on local JVM
Common ports

5. Verify that the ES is started successfully

Open the browser and enter in the address bar. http://127.0.0.1:9200When the following prompt appears, it means that the ES has started successfully
Verify

2. Download and install the elasticsearch-head plugin

Ealsticsearch only provides various APIs on the backend, so how to use it intuitively? elasticsearch-head will be a client tool specifically for elasticsearch

Node.js and grunt are required to install head for ES5 and above

// elasticsearch-head项目地址
https://github.com/mobz/elasticsearch-head
// node.js下载地址
https://nodejs.org/zh-cn/

Unzip elasticsearch-head, open cmd in the current elasticsearch-head directory, and execute the command

// 安装依赖
node install
// 安装grunt
npm install grunt-cli
// 运行grunt
grunt server

After the installation is complete, use cmd to enter the installation directory.
Execute node -v to view the version number of node.js.
Execute grunt -version to view the grunt version number
Start Head
. Enter it in the browser http://localhost:9100and see the following interface, indicating that elasticsearch-head has started successfully
Head web page

Three, the use of ES

1 Overview

Elasticsearch is document oriented, which means it can store entire objects or documents. However, it is not
only storage, but also indexes the content of each document so that it can be searched. In Elasticsearch, you can
index, search, sort, and filter documents (rather than rows and columns ). Elasticsearch compares traditional relational databases as follows:

Relational DB ‐> Databases ‐> Tables ‐> Rows ‐> Columns
Elasticsearch ‐> Indices ‐> Types ‐> Documents ‐> Fields

2. Common concepts

Index

An index is a collection of documents with similar characteristics. For example, you can have an index for customer data, another index for a product catalog
, and an index for order data. An index is identified by a name (must be in all lowercase letters), and when we want to
index, search, update, and delete documents corresponding to this index, we must use this name. In a cluster, any number of indexes can be defined
.

Type

In an index, you can define one or more types. A type is a logical classification/partition of your index, and its semantics are completely up to you
. Usually, a type is defined for documents that have a set of common fields. For example, let's suppose you run a blogging platform and store all your data
in an index. In this index, you can define one type for user data and another type for blog data. Of course, you can also
define another type for comment data.

Field

It is equivalent to the field of the data table, and the classification and identification of the document data according to different attributes

Mapping

Mapping is to make some restrictions on the way and rules of processing data, such as the data type of a field, default value, analyzer, whether to be indexed, etc.
These are all settings that can be set in the mapping, and the others are some of the data in es. The use of rule settings is also called mapping. Processing data according to the optimal rules
greatly improves performance. Therefore, it is necessary to establish a mapping, and it is necessary to think about how to establish a mapping to achieve better performance.

Document

A document is a basic unit of information that can be indexed. For example, you can have a document for a certain customer, a document for a certain product, and of course,
you can also have a document for a certain order. Documents are expressed in JSON (Javascript Object Notation) format, and JSON is an
Internet data exchange format that exists everywhere .
In an index/type, you can store as many documents as you want. Note that although a document physically exists in an index, the document must
be indexed/given an index type.

Near real-time NRT

Elasticsearch is a near real-time search platform. This means that there is a slight delay from indexing a document until the document can be searched
(usually within 1 second)

Cluster

A cluster is organized by one or more nodes, which together hold the entire data and provide index and search functions together. A cluster is
identified by a unique name, which is "elasticsearch" by default. This name is important because a node can only
join this cluster by specifying the name of a certain cluster

Node

A node is a server in the cluster. As a part of the cluster, it stores data and participates in the indexing and search functions of the cluster. Similar to a cluster, a
node is also identified by a name. By default, this name is the name of a random Marvel comic character. This name will be
assigned to the node when it is started . This name is very important for management work, because in this management process, you will determine which servers in the network
correspond to which nodes in the Elasticsearch cluster.
A node can join a specified cluster by configuring the cluster name. By default, each node will be arranged to join a
cluster called "elasticsearch", which means that if you start several nodes in your network and assume they can discover each other,
they will Automatically form and join a cluster called "elasticsearch".
In a cluster, you can have as many nodes as you want. Moreover, if there are currently no Elasticsearch nodes running on your network,
starting a node at this time will create and join a cluster called "elasticsearch" by default.

Sharding and replication shards&replicas

An index can store a large amount of data beyond the hardware limit of a single node. For example, an index with 1 billion documents occupies 1TB of disk space, and
no node has such a large disk space; or a single node processes search requests and responds too slowly. To solve this problem, Elasticsearch provides
the ability to divide the index into multiple parts, which are called shards. When you create an index, you can specify the number of shards you want. Each
shard itself is also a fully functional and independent "index", which can be placed on any node in the cluster. Fragmentation is important
for two main reasons: 1) Allows you to split/expand your content capacity horizontally. 2) Allows you to
perform distributed and parallel operations on shards (potentially on multiple nodes) , thereby improving performance/throughput.
As for how a shard is distributed and how its documents are aggregated back to search requests, it is completely managed by Elasticsearch,
which is transparent to you as a user .
In a network/cloud environment, failure can happen at any time. A certain shard/node is offline for some reason or
disappears for any reason . In this case, a failover mechanism is very useful. And it is highly recommended. For this purpose, Elasticsearch allows you to create
one or more copies of a shard. These copies are called replicated shards, or simply replication.
There are two main reasons why replication is important: It provides high availability in the case of shard/node failure. For this reason, it is important to note that the replicated shard
is never placed on the same node as the original/primary shard. Expand your search volume/throughput, because searches can be
run in parallel on all replications. In short, each index can be divided into multiple fragments. An index can also be copied 0 times (meaning no copy)
Or multiple times. Once replicated, each index has a difference between the primary shard (the original shard as the source of replication) and the replicated shard (the copy of the primary shard). Sub-
slice and number of copies can be specified when the index was created. After the index is created, you can dynamically change the number of replications at any time, but you
cannot change the number of shards afterwards.
By default, Elasticsearch each index is fragmented five main fragmentation and a copy, which means that if there are at least two nodes in your cluster
point, you will have five main index fragmentation And the other 5 replicated shards (1 full copy), so there are a total of 10 shards per index.

3. Common operations in ES

Create index index

Create index index

Set Mapping after index creation

Note: The mapping type was removed after ES 7.X

See the official ES description: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html

Generally, I get an error:, ”Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true”which is caused by the upgrade of ElasticSearch from a low version to a high version

The new ElasticSearch 7.x has been adjusted, and there is no need to submit the type when mapping is required. In order to ensure the compatibility of the old code, a include_type_nameparameter has been added , and such a request cannot be completed when the request comes with the Head plugin. We need to use the postman tool.
Set Mapping after index creation

Delete index

Delete index

Create document document

Create index and set Mapping in one step.
Create document document
Create documents.
Create document
All of this can be verified in the Head's visual interface.
Head's visual interface

Modify document

Modify document
Modified result

Delete document

Delete document
Delete result

Query documents-query by id

Query by id

Query document-querystring query

querystring query
querystring query
Modify the search content "Search Server" to "Search Steel Cable", and the document can also be searched. The reason is that ES divides the string, and data containing any one of the words will be queried

Query document-term query

term query
term query

Four, IK tokenizer

1. Download and install

// 项目地址,下载的版本最好和ES版本一致
https://github.com/medcl/elasticsearch-analysis-ik

Unzip, copy the unzipped folder to ES directory \plugins
Download and install IK tokenizer
Run ES, you can see that the ik tokenizer has been loaded
Run ES

2. IK tokenizer test

Minimal segmentation ik_smart

Minimal segmentation ik_smart
Returned result

{
    
    
    "tokens": [
        {
    
    
            "token": "这是",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
    
    
            "token": "一个",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
    
    
            "token": "对",
            "start_offset": 4,
            "end_offset": 5,
            "type": "CN_CHAR",
            "position": 2
        },
        {
    
    
            "token": "分词器",
            "start_offset": 5,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 3
        },
        {
    
    
            "token": "的",
            "start_offset": 8,
            "end_offset": 9,
            "type": "CN_CHAR",
            "position": 4
        },
        {
    
    
            "token": "测试",
            "start_offset": 9,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 5
        }
    ]
}

Finest segmentation ik_max_word

Finest segmentation ik_max_word
Return result

{
    
    
    "tokens": [
        {
    
    
            "token": "这是",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
    
    
            "token": "一个",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
    
    
            "token": "一",
            "start_offset": 2,
            "end_offset": 3,
            "type": "TYPE_CNUM",
            "position": 2
        },
        {
    
    
            "token": "个",
            "start_offset": 3,
            "end_offset": 4,
            "type": "COUNT",
            "position": 3
        },
        {
    
    
            "token": "对分",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 4
        },
        {
    
    
            "token": "分词器",
            "start_offset": 5,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 5
        },
        {
    
    
            "token": "分词",
            "start_offset": 5,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
    
    
            "token": "器",
            "start_offset": 7,
            "end_offset": 8,
            "type": "CN_CHAR",
            "position": 7
        },
        {
    
    
            "token": "的",
            "start_offset": 8,
            "end_offset": 9,
            "type": "CN_CHAR",
            "position": 8
        },
        {
    
    
            "token": "测试",
            "start_offset": 9,
            "end_offset": 11,
            "type": "CN_WORD",
            "position": 9
        }
    ]
}

3. The analyzer uses ik

Delete index
Delete index
Create new index and Mapping. When the analyzer selects ik_max_word, the Mapping
New index and mapping
can be viewed in the Head.
See Mapping in Head
When using the ik tokenizer (ik_max_word) to query the keyword "search steel cable" again,
Query keywords
use the ik tokenizer (ik_max_word) to query When the keyword "search",
Query keywords
if this article is helpful to you, like and forward to favorites, your support is my biggest motivation!

Guess you like

Origin blog.csdn.net/y1534414425/article/details/108933207