ElasticSearch (installation and use of full-text search service)

Table of contents

1. Introduction to ElasticSearch

1.1 Why use ElasticSearch?

1.2 Introduction to ElasticSearch

1.3 Principle and application

1.3.1. Index structure

1.3.2. Inverted index

1.4 What is ELK?

1.5 Features and advantages of ES

2. Install ElasticSearch

2.1 Environmental requirements

2.2 Install ES

2.2.1 Download

2.2.2. Set virtual machine memory

2.2.3 Create user 

2.2.4 Installation

2.2.5 ES directory structure

2.3 Configuration file

 2.3.1 elasticsearch.yml

 2.3.2 jvm.options

2.3.3 log4j2.properties

2.4 Start ES

2.4.1. Startup and shutdown

2.4.2. Solving Kernel Problems Edit

2.4.3. Solve the file creation permission problem

2.4.4. Solve the thread opening limit problem

2.4.5. Solving virtual memory problems

2.5. Testing

3. Install Kibana

3.1 What is Kibana

3.2 download

3.3 Installation

3.4 Modify configuration

3.5 start

 3.6 Testing

Fourth, install the head

4.1. What is head

4.2. Installation

4.3. Testing

Five, ES quick start

5.1. Index management

5.1.1. Create index

5.1.2. Modify index

5.1.3. Delete index

5.2. Mapping management

5.2.1 create mapping

5.2.2. Query mapping

5.2.3. Update mapping

5.2.4. Delete mapping

5.3. document management

5.3.1. Create document

5.3.2. Query document

5.3.3. Delete Document

5.4. ES read and write process

5.4.1.documnet routing (data routing)

5.4.4. Why is the number of primary shards immutable?

5.5.luke view the logical structure of ES

6. IK tokenizer

6.1. Test tokenizer

6.2. Chinese tokenizer

6.2.1. Lucene has its own Chinese word breaker

6.2.2. Third-party Chinese analyzer

6.3. Install IK tokenizer

6.4. Two word segmentation modes

6.5. Custom thesaurus

Seven, field (domain, field) detailed introduction

7.1. Field attribute introduction

7.1.1.type data type:

7.1.2.analyzer specifies word segmentation mode:

7.1.3.index:

7.1.4.source:

7.2. Commonly used field types

7.3. The setting standard of field attribute

8. Spring Boot integrates ElasticSearch

8.1. ES client

8.2. Construction project

8.2.1.pom.xml

8.2.2.application.yml

8.2.3.app

8.3. Index Management

8.3.1. Create index library

8.3.2. Delete the index library

8.3.2. Adding documents

8.3.3. Batch add documents

8.3.4. Modifying documents

8.3.5. Delete document

8.4. Document Search

8.4.1. Prepare the environment

8.4.2. Simple search

8.4.3.DSL search

8.4.3.1.match_all query

8.3.3.2. Pagination query

8.3.3.4.match query

8.3.3.5. multi_match query

8.3.3.6. bool query

8.3.3.7. filter query

8.3.3.8.highlight query

Nine, cluster management 

9.2. Create node 2

9.3. View cluster health status

9.4. Testing


1. Introduction to ElasticSearch

1.1 Why use ElasticSearch?

         When we visit a shopping website, we can enter keywords according to the content we want to find out relevant content. How can this be done? These random data cannot be queried according to the fields of the database. How can they be queried? Why can all kinds of strange keywords be queried?

         ​ The answer is the full-text search service. ElasticSearch is a Lucene-based full-text search server, and Lucene uses a lexical matching scheme. For example: Beijing Tiananmen----Lucene segmentation word: Beijing Tiananmen and so on these words, when we search these words can be retrieved Beijing Tiananmen.

1.2 Introduction to ElasticSearch

        ElasticSearch is a Lucene -based search server. It provides a distributed full-text search engine based on a RESTful  web interface. ElasticSearch, developed in the Java language and released as open source under the terms of the Apache license, is a popular enterprise-grade search engine. ElasticSearch is used in cloud computing to achieve real-time search, stable, reliable, fast, and easy to install and use. According to the DB-Engines ranking, ElasticSearch is the most popular enterprise search engine, followed by Apache Solr (also based on Lucene).

Summarize:

1. Elasticsearch is a distributed full-text search server based on Lucene.

2. Elasticsearch hides the complexity of Lucene and provides a Restful interface to operate indexing and searching.

Which one to choose between es and solr?

1. If the solr currently used by your company can meet the needs, don't change it.

2. If your company is going to develop a full-text search project, it is recommended to give priority to elasticsearch, because it is used in large-scale searches like Github.

1.3 Principle and application

1.3.1. Index structure

The figure below shows the index structure of ElasticSearch. The black and blue part on the right is the original document, and the yellow part on the left is the logical structure. The logical structure is also to better describe the working principle of ElasticSearch and use the index files in the physical structure.

1.3.2. Inverted index

Inverted index: Also often called an inverted index, an inverted index is a mapping from keywords to documents (knowing keywords for documents).

The logical structure part is an inverted index table, which consists of three parts:

1. Finally, store the searched documents in the form of Document .

2. Segment the content of the document to be searched into words, and form a word list of all non-repeated words.

3. Each participle is associated with docment .

as follows:

 Now, if we want to search for documents containing the term quick brown:

Both documents match, but the first document matches more closely than the second. If we use a simple similarity algorithm that only counts the number of matching terms, then we can say that the first document is better than the second for relevance to our query.  

1.4 What is ELK?

ELK=elasticsearch+Logstash+kibana

  • elasticsearch: background distributed storage and full-text retrieval
  • logstash: log processing, "porter"
  • kibana: data visualization display.

The ELK architecture creates a powerful management chain for distributed data storage, visual query, and log parsing. The three cooperate with each other, learn from each other's strengths, and jointly complete the distributed big data processing work.

1.5 Features and advantages of ES

1) Distributed real-time file storage, which can store each field in the index so that it can be retrieved.
2) Distributed search engine for real-time analysis.
Distributed: The index is split into multiple shards, and each shard can have zero or more replicas. Each data node in the cluster hosts one or more shards and coordinates and handles various operations;
load rebalancing and routing is done automatically in most cases.
3) It can be extended to hundreds of servers and handle PB-level structured or unstructured data. It can also run on a single PC (tested)
4) Support plug-in mechanism, word segmentation plug-in, synchronization plug-in, Hadoop plug-in, visualization plug-in, etc.

2. Install ElasticSearch

2.1 Environmental requirements

1. The jdk must be jdk1.8.0_131 or above.

2. ElasticSearch needs at least 4096 thread pools, 65536 file creation permissions and 262144 bytes or more of virtual memory to start normally, so it is necessary to allocate at least 1.5G of memory to the virtual machine

3. Starting from 5.0, the security level of ElasticSearch has been improved, and the root account is not allowed to start

4. The plug-in of Elasticsearch requires at least centos kernel version 3.5 or above

2.2 Install ES

2.2.1 Download

ElasticSearch Official Website: Free and Open Search: Developers of Elasticsearch, ELK and Kibana | Elastic

2.2.2. Set virtual machine memory

2.2.3 Create user 

Starting from 5.0, the security level of ElasticSearch has been improved, and it is not allowed to start with the root account, so we need to add a user.

1. Create elk user group

groupadd each

2. Create user admin

 useradd admin

 passwd admin

3. Add the admin user to the elk group

usermod -G elk admin

5. Assign permissions to users

#chown Change the owner of the specified file to the specified user or group -R process all files in the specified directory and its subdirectories

chown -R admin:elk /usr/upload

chown -R admin:elk /usr/local

Switch user:

are admin

2.2.4 Installation

ES is an application developed by Java, which can be installed after decompression:

tar -zxvf elasticsearch-6.2.3.tar.gz -C /usr/local

2.2.5 ES directory structure

bin directory: executable file package
config directory: configuration-related directory
lib directory: jar packages that ES needs to depend on, ES self-developed jar packages
logs directory: log file-related directories
modules directory: storage directory for functional modules, such as aggs, reindex, geoip, xpack, eval
plugins directory: plug-in directory package, three-party plug-in or self-developed plug-in
data directory: a directory that is automatically created after ES starts, and stores data that needs to be saved during ES operation.

2.3 Configuration file

The configuration file in the ES installation directory config is as follows:

        elasticsearch.yml: used to configure Elasticsearch running parameters

        jvm.options: Used to configure Elasticsearch JVM settings

        log4j2.properties: used to configure Elasticsearch logs

 2.3.1 elasticsearch.yml

The configuration of this project is as follows:

cluster.name: power_shop
node.name: power_shop_node_1
network.host: 0.0.0.0
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.unicast.hosts: ["192.168.115.135:9300", "192.168.115.136:9300"]
path.data: /usr/local/elasticsearch-6.2.3/data
path.logs: /usr/local/elasticsearch-6.2.3/logs
http.cors.enabled: true
http.cors.allow-origin: /.*/

Note that the path.data and path.logs paths are configured correctly.

Commonly used configuration items are as follows:

 cluster.name:
       Configure the cluster name of elasticsearch, the default is elasticsearch. It is recommended to change it to a meaningful name.   
node.name:
      node name, usually a physical server is a node, es will randomly specify a name by default, it is recommended to specify a meaningful name to facilitate the management of one or more nodes to form a cluster cluster, cluster is a logical concept , the node is a physical concept, which will be introduced in detail in the following chapters.
path.data:
       Set the storage path of the index data, the default is the data folder under es_home, you can set multiple storage paths, separated by commas.      
path.logs:
       Set the storage path of log files, the default is the logs folder under es_home         
network.host:  
       Set the ip address of the bound host, setting it to 0.0.0.0 means binding any ip, allowing access from the external network, recommended for production environments Set to a specific ip.   
http.port: 9200
       Set the http port for external services, the default is 9200.      
transport.tcp.port: 9300 
       communication ports between cluster nodes      
discovery.zen.ping.unicast.hosts: [“host1:port”, “host2:port”, “…”]  
       Set the initial list of master nodes in the cluster.
discovery.zen.ping.timeout: 3s  
       Set the timeout time for ES to automatically discover node connections. The default is 3 seconds. If the network delay is high, it can be set larger.
http.cors.enabled:
       Whether to support cross-domain, the default is false
http.cors.allow-origin:
       When setting allows cross-origin, the default is *, indicating that all domain names are supported

 2.3.2 jvm.options

Set the minimum and maximum JVM heap memory size:

Set -Xms and -Xmx in jvm.options:

1) Both values ​​are set equal

2) XmxSet to no more than half of physical memory.

The default memory usage is too much, let's adjust it smaller:

-Xms512m
-Xmx512m

2.3.3 log4j2.properties

Log file settings, ES uses log4j, pay attention to the configuration of the log level.

2.4 Start ES

2.4.1. Startup and shutdown

1. Start

./elasticsearch
#或
./elasticsearch -d    

2. close

ps-ef|grep elasticsearch

kill -9 pid

2.4.2. Solve kernel problems  

We are using centos6 with linux kernel version 2.6. The Elasticsearch plugin requires at least version 3.5 or above. But it doesn't matter, we can disable this plugin.

Modify the elasticsearch.yml file and add the following configuration at the bottom:

bootstrap.system_call_filter: false 

2.4.3. Solve the file creation permission problem

 [1]: max file descriptors [4096] for elasticsearch process likely too low, increase to at least [65536]

By default, Linux generally limits the maximum number of files created by applications to 4096. But ES requires at least 65536 file creation permissions. We are using the admin user, not root, so the file permissions are insufficient.

Use the root user to modify the configuration file:

vim /etc/security/limits.conf

 Append the following:

* soft nofile 65536
* hard nofile 65536

2.4.4. Solve the thread opening limit problem

[2]: max number of threads [3795] for user [es] is too low, increase to at least [4096]

        By default, Linux restricts processes started by the root user to open any number of threads, and processes started by other users can open 1024 threads. The limit must be modified to 4096+. Because ES requires at least 4096 thread pool preparations.

        If the memory of the virtual machine is 1G, only 3000+ threads can be opened at most. Allocate at least 1.5G memory for the virtual machine.

Use the root user to modify the configuration:

 vim /etc/security/limits.conf

addition:

* hard nproc  4096 

2.4.5. Solving virtual memory problems

[3]: max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144] 

ES needs to open up a virtual memory with a space of more than 262144 bytes. By default, Linux does not allow any user or application to directly open up virtual memory.

vim /etc/sysctl.conf 

Append the following:

vm.max_map_count=655360 #Limit the number of VMAs (virtual memory areas) that a process can have 

Then execute the command to make the sysctl.conf configuration take effect:

sysctl -p 

2.5. Testing

        As long as any ES application is started in ES, an ES cluster is started. The default ES cluster is named elasticsearch. If multiple applications are started (multiple applications can be started on multiple nodes or a single node), the default ES will automatically find the cluster to join the cluster.

Browser access: http://192.168.204.132:9200

The returned results are as follows:

 {   "name" : "power_shop_node_1", # node name node name. Randomly assigned node name   "cluster_name" : "power_shop", # cluster name cluster name. Default cluster name   "cluster_uuid" : "RqHaIiYjSoOyrTGq3ggCOA", # cluster unique ID   "version" : {     "number" : "6.2.3", #version number     "build_hash" : "c59ff00",      "build_date" : "2018-03 -13T10:06:29.741383Z", #release date     "build_snapshot": false, #whether snapshot version     "lucene_version": "7.2.1", #lucene version number     "minimum_wire_compatibility_version": "5.6.0",     "minimum_index_compatibility_version": "5.0.0"   },













3. Install Kibana

3.1 What is Kibana

        Kibana is a Node.js-based management console provided by ES, which can easily implement advanced data analysis and visualization, and display it in the form of icons.

        Kibana can be used to edit request statements, which is convenient for learning the syntax of operating es. Sometimes when writing programs and writing query statements, I often use kibana to write and then paste them into the program. (less prone to errors)

3.2 download

ElasticSearch Official Website: Free and Open Search: Developers of Elasticsearch, ELK and Kibana | Elastic

3.3 Installation

It is very convenient to install Kibana in the window, just unzip and install

3.4 Modify configuration

Modify the config/kibana.yml configuration:

server.port: 5601
server.host: "0.0.0.0" #Allow connections from remote users
elasticsearch.url: http://192.168.116.135:9200 #URL of the Elasticsearch instance 

3.5 start

 ./bin/folder

 3.6 Testing

Browser access: http://127.0.0.1:5601

 

Fourth, install the head

4.1. What is head

The head plug-in is a visual management plug-in of ES, which is used to monitor the status of ES and interact with ES services through the head client, such as creating mappings, creating indexes, etc. Starting with ES6.0, the head plugin supports running node.js.

4.2. Installation

1. Download head

Download address: https://github.com/mobz/elasticsearch-head

2. Run

npm run start

4.3. Testing

http://127.0.0.1:9100 Browser access: http://127.0.0.1:9100

Five, ES quick start

         As an index and search service, ES provides a rich REST interface to the outside world. The examples in the Quick Start section use kibana to test, the purpose is to have a preliminary understanding of the usage and process of ES.

5.1. Index management

5.1.1. Create index

index library. Contains several Document data with similar structure, which is equivalent to the database of the database.

grammar:PUT /index_name

like:

 PUT /java2202
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 1
  }
}

number_of_shards - Indicates that an index library will be split into multiple pieces to store different nodes, which improves the processing capacity of ES

number_of_replicas - is the number of replica shards allocated for each primary shard, which improves the availability of ES. If there is only one machine, set it to 0

Effect:

5.1.2. Modify index

Note: Once the index is created, the number of primary shards cannot be changed, but the number of replica shards can be changed.

grammar:PUT /index_name/_settings

like:

PUT /java06/_settings

{  

        "number_of_replicas" : 1

}

ES has requirements for the distribution of shards, and has its built-in special algorithm:

The Replica shard will guarantee that it will not be allocated on the same node as its primary shard; if there is only one node, then the status of the index after the execution of this case must be yellow.

5.1.3. Delete index

DELETE /java06[, other_index]

5.2. Mapping management

Mapping, creating a mapping is the process of creating a field (type, whether to index, whether to store, etc.) in the index library. The following is an analogy between document and field and the concept of a relational database:

elasticsearch relational database
index (index library) database (database)
type (type) table (table)
document (document) row (record)
field (domain) column (field)

Note: Versions before 6.0 have the concept of type (type), which is equivalent to a table in a relational database. After ES6.x version, the concept of type is weakened. ES officials will completely delete type in ES7.0 version.

5.2.1 create mapping

grammar:POST /index_name/type_name/_mapping

like:

POST /java06/course/_mapping
{
  "properties": {
     "name": {
        "type": "text"
     },
     "description": {
        "type": "text"
     },
     "studymodel": {
        "type": "keyword"
     }
  }
}

Effect:

 

5.2.2. Query mapping

Query the mapping of all indexes:

GET /java06/course/_mapping

5.2.3. Update mapping

New fields can be added after the mapping is successfully created, but existing fields are not allowed to be updated.

5.2.4. Delete mapping

Drop the mapping by dropping the index.

5.3. document management

5.3.1. Create document

Documents in ES are equivalent to records in MySQL database tables.

5.3.1.1. POST syntax

This operation is a new Document method for ES to automatically generate an id.

grammar:POST /index_name/type_name/id

like:

POST /java06/course/1
{
  "name": "python from entry to abandonment",
  "description": "Life is short, I use Python",
  "studymodel":"201002"
}
POST /java06/course
{
  "name": ".net from entry to abandonment",
  "description": ".net programmers are not convinced",
  "studymodel":"201003"
}

5.3.1.2. PUT Syntax

This operation is a way to add a Document with a manually specified id.

语法:PUT/index_name/type_name/id{field_name:field_value}

like:

PUT /java06/course/2
{
  "name": "php from entry to abandonment",
  "description": "php is the best language in the world",
  "studymodel":"201001"
}

result:

{
  "_index": "test_index", what index is the new document in,
  "_type": "my_type", which type of the new document is in the index.
  "_id": "1", what is the specified id
  "_version": 1, what is the version of the document, the version is incremented from 1, every write operation will be +1
  "result": "created", the result of this operation, created created, updated modified, deleted deleted
  "_shards": { shard information
      "total": 2, the number of shards only shows the primary shard
      "successful": 1, the data document must only be stored in a certain primary shard in the index
      "failed": 0
  },
  "_seq_no": 0, 
  "_primary_term": 1
}

Query data through head:

5.3.2. Query document

grammar:

GET /index_name/type_name/id

or

GET /index_name/type_name/_search?q=field_name:field_value

Such as: query documents according to course id

GET /java06/course/1

Such as: query all records

GET /java06/course/_search

For example: query the records that include the php keyword in the name

GET /java06/course/_search?q=name:门

result:

 {   "took": 1, # The duration of execution. The unit is milliseconds   "timed_out": false, # whether to time out   "_shards": { # shard related data     "total": 1, # total number of shards     "successful": 1, # number of shards that successfully returned results     "skipped": 0,     "failed": 0   },   "hits": { # search result related data     "total": 3, # total number of data, number of data that meet the search criteria     "max_score": 1, # maximum relevance score, and search criteria Matching degree     "hits": [# Specific search results       {         "_index": "java06", # index name         "_type": "course", # type name         "_id": "1", # id value         "_score": 1,




















          "studymodel": "201001"
        }, {             "_index": "java06",             "_type": "course",             "_id": "2",             "_score": 0.13353139,             "_source": {                 "name": " PHP from entry to abandon",                 "description": "php is the best language in the world",                 "studymodel": "201001"             }         }, {             "_index": "java06",             "_type": "course",             " _id": "6ljFCnIBp91f7uS8FkjS",             "_score": 0.13353139,             "_source": {                 "name": ".net from entry to abandonment",                 "description": ".net programmers are not convinced",

















                "studymodel": "201003"
            }
        }
     ]
  }
}

5.3.3. Delete Document

When performing a delete operation in ES, ES first marks the Document as deleted instead of directly physically deleting it. When the ES storage space is insufficient or the work is idle, the physical deletion operation will be performed, and the data marked as deleted will not be searched by the query (deleting the index in ES is also a mark. Physical deletion will be performed later. All marking actions are is for NRT (near real-time) implementation)

grammar:DELETE /index_name/type_name/id

like:

DELETE /java06/course/3 

result:

{
  "_index": "java06",
  "_type": "course",
  "_id": "2",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 3,
  "_primary_term": 1

5.4. ES read and write process

5.4.1.documnet routing (data routing)

When the client creates a document, ES needs to determine which shard the document is placed on. This process is document routing.

Routing process:

    Routing algorithm: shard = hash(5) %number_of_primary_shards

    id: the _id of the document, which may be manually specified or automatically generated to determine which shard a document is on

    number_of_primary_shards : Number of primary shards.

5.4.4. Why is the number of primary shards immutable?

Reason: If our cluster has 5 primary shards during initialization, we add a document id=5 to it, if hash(5)=23, then the document will be added (shard=23%5=3) On the P3 slice. If we then add a primary shard to the es cluster, there will be 6 primary shards at this time. When we GET id=5, this data, es will calculate the routing information of the request and find the primary shard that stores it (shard= 23%6=5), locate on the P5 slice according to the calculation result. And our data is on P3. Therefore, the es cluster cannot add primary shards, but replicas shards can be expanded.

5.5.luke view the logical structure of ES

  1. Copy elasticsearch-6.2.3/data to windows

  2. Double-click luke.bat to start luke

  3. Use luke to open the data\nodes\0\indices path

6. IK tokenizer

6.1. Test tokenizer

Word segmentation is performed when adding a document, and the index stores each term (term). When you search, you use keywords to match the term, and finally find the document associated with the term.

Test the tokenizer used by the current index library:

POST /_analyze

{  

        "text": "Test the tokenizer, followed by the test content: spring cloud actual combat"

}

The result is as follows:

 It will be found that the effect of word segmentation splits the word "test" into two words "test" and "test". This is because the word segmenter used by the current index library is a single-word word segmentation for Chinese.

6.2. Chinese tokenizer

6.2.1. Lucene has its own Chinese word breaker

StandardAnalyzer:

Word segmentation: It is to divide words one by one according to Chinese characters. Such as: "I love China", the effect: "I", "love", "China", "country".

CJKAnalyzer

Dichotomy word segmentation: segment by two characters. For example: "I am Chinese", the effect: "I am", "I am Chinese", "Chinese" and "Chinese".

The above two tokenizers cannot meet the demand.

SmartChineseAnalyzer

Good support for Chinese, but poor scalability, difficult to deal with extended lexicons and forbidden lexicons

6.2.2. Third-party Chinese analyzer

paoding : The latest version of Paoding Jieniu supports at most Lucene 3.0 in https://code.google.com/p/paoding/ , and the latest submitted code is in 2008-06-03, and the latest in svn is also submitted in 2010, which has been Obsolete, not considered.

IK-analyzer : The latest version is at https://code.google.com/p/ik-analyzer/ , supporting Lucene 4.10 Since version 1.0 was launched in December 2006, IKAnalyzer has launched 4 major versions. Initially, it was based on the open source project Luence, a Chinese word segmentation component that combines dictionary word segmentation and grammar analysis algorithms. Starting from version 3.0, IK has developed into a Java-oriented public word segmentation component, independent of the Lucene project, and provides a default optimized implementation of Lucene. In the 2012 version, IK implemented a simple word segmentation ambiguity elimination algorithm, marking the evolution of the IK tokenizer from pure dictionary word segmentation to simulated semantic word segmentation. But that is, it has not been updated after December 2012.

6.3. Install IK tokenizer

Using the IK tokenizer can achieve the effect of Chinese word segmentation.

Download the IK tokenizer: (Github address: https://github.com/medcl/elasticsearch-analysis-ik )

1. Download zip:

2. Unzip and copy the decompressed files to the ik (rename) directory under plugins in the ES installation directory, restart es

3. Test word segmentation effect:

POST /_analyze
{   "text":"The Great Hall of the People's Republic of China",   "analyzer":"ik_smart"


6.4. Two word segmentation modes

The ik tokenizer has two segmentation modes: ik_max_word and ik_smart modes.

1、i_max_word

The text will be split into the finest granularity, for example, the "Great Hall of the People's Republic of China" will be split into "People's Republic of China, Chinese People, China, Chinese, Great Hall of the People, People, Republic, Great Hall, General Assembly, Words such as hall.

2、ik_smart

The most coarse-grained split will be done, for example, the "Great Hall of the People of the People's Republic of China" will be split into the People's Republic of China and the Great Hall of the People.

6.5. Custom thesaurus

If you want the tokenizer to support some proprietary words, you can customize the thesaurus.

The main.dic file that comes with the iK tokenizer is an extended dictionary, and stopword.dic is a disabled dictionary.

You can also create a new my.dic file in the above directory ( note that the file format is utf-8 (do not choose utf-8 BOM) )

You can customize the vocabulary in it:

For example define:

Configure my.dic in the configuration file,

Seven, field (domain, field) detailed introduction

The above chapter installed the ik word breaker, how to use the ik word breaker when indexing and searching? How to specify the type of field? Such as date type, numeric type, etc.

The field types of ES6.2 core are as follows:

7.1. Field attribute introduction

7.1.1.type data type:

The type of the field is specified by the type attribute .

The more commonly used types:

Text: text, keyword (write to the index directory without word segmentation)

Numbers: integer, long, float, double

7.1.2.analyzer specifies word segmentation mode:

Specify the word segmentation mode through the analyzer attribute . ik_max_word conducts fine-grained word segmentation for search content, and ik_smart coarse-grained word segmentation

"name": {                   "type": "text",                   "analyzer": "ik_max_word", #Field word segmentation to index directory, use fine-grained                   "search_analyzer": "ik_smart" #When searching, use coarse-grained precision  }



7.1.3.index:

Specify whether to write to the index directory through the index attribute .

The default is index=true, that is, indexing is required, and only after indexing can it be searched from the index library. But there are also some content that do not need to be indexed. For example, the URL of the product image is only used to display the image, and the image is not searched. In this case, the index can be set to false.

Delete the index, recreate the mapping, set the index of the pic to false, try to search according to the pic, but the result is that no data can be found

 "pic": {
         "type":"text",           
       "index":false
}

7.1.4.source:

That is to say whether the current field is displayed, such as: product description, when you search for a product, there will definitely not be a bunch of product descriptions, but you can search by description when searching, and this is used at this time. It is conceivable that if no processing is done, the amount of data retrieved from thousands of items is very large

includes: stored in the document document

POST /java06/course/_mapping
{
  "_source": {
    "includes":["description"]
  }
}

 excludes: not stored in the document document, but can also be searched by setting word segmentation

POST /java06/course/_mapping
{
  "_source": {
    "excludes":["description"]
  }
}

7.2. Commonly used field types

1) Text: text, keyword (write to the index directory without word segmentation)

2) date date type

The date type does not need to set a tokenizer, and usually the field of the date type is used for sorting. 1) format Set the date format through format, multiple formats are separated by double vertical bars ||, each format will be tried in turn until a matching one is found

For example: 1. The setting allows the date field to store three formats: year-month-day-hour-minute-second, year-month-day, and millisecond.

POST /java06/course/_mapping
{
    "properties": {
       "timestamp": {
         "type":   "date",
         "format": "yyyy-MM-dd"
       }
     }
}

2. Insert the document:

PUT /java06/course/3
{ "name": "spring development foundation", "description": "spring is very popular in the java field, and java programmers are using it.", "studymodel": "201001",  "pic" :"250.jpg",  "timestamp":"2018-07-04 18:28:58"





3) Numeric type

Number types in es support sorting and interval search after word segmentation (special)

For example: 1. To update existing mappings:

POST /java06/course/_mapping
{
    "properties": {
    "price": {
        "type": "float"
     }
  }
}  

2. Insert document

PUT /java06/course/3
{  "name": "spring development foundation",  "description": "spring is very popular in the java field, and java programmers are using it.",  "studymodel": "201001",  "pic" :"250.jpg",  "price":38.6





7.3. The setting standard of field attribute

Attributes standard
type does it make sense
index whether to search
source whether to display

8. Spring Boot integrates ElasticSearch

8.1. ES client

ES provides a variety of different clients:

1、TransportClient

The traditional client provided by ES, the official plan is to delete this client in version 8.0.

2、RestClient

RestClient is officially recommended, and it includes two types: REST Low Level Client and REST High Level Client. ES provides REST High Level Client after 6.0, and the official recommendation of the two clients is to use REST High Level Client, but it is still being perfected, and some functions are not yet available.

8.2. Construction project

8.2.1.pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.2.RELEASE</version>
    </parent>

    <groupId>com.bjpowernode</groupId>
    <artifactId>springboot_elasticsearch</artifactId>
    <version>1.0-SNAPSHOT</version>
    
    <!-- 修改elasticsearch的版本 -->
    <properties>
        <elasticsearch.version>6.2.3</elasticsearch.version>
    </properties>
    
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>${elasticsearch.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
        </dependency>
    </dependencies>
</project>

8.2.2.application.yml

spring:
  elasticsearch:
    rest:
      uris:
        - http://192.168.204.132:9200 

8.2.3.app

package com.bjpowernode;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class ElasticsearchApp {

	public static void main(String[] args) {
		SpringApplication.run(ElasticsearchApp.class, args);
	}
}

8.3. Index Management

8.3.1. Create index library

8.3.1.1.api

Create an index library:

PUT /java06
{
  "settings":{
       "number_of_shards" : 2,
       "number_of_replicas" : 0
  }

Create a mapping:

POST /java06/course/_mapping
{
  "_source": {
    "excludes":["description"]
  }, 
     "properties": {
      "name": {
          "type": "text",
          "analyzer":"ik_max_word",
          "search_analyzer":"ik_smart"
      },
      "description": {
          "type": "text",
          "analyzer":"ik_max_word",
          "search_analyzer":"ik_smart"
       },
       "studymodel": {
          "type": "keyword"
       },
       "price": {
          "type": "float"
       },
       "pic":{
           "type":"text",
           "index":false
        }
  }

8.3.1.2.Java Client

package com.bjpowernode.test;

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import org.elasticsearch.action.DocWriteResponse;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.IndicesClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
@RunWith(SpringJUnit4ClassRunner.class)
@SpringBootTest(classes = {ElasticsearchApp.class})
public class IndexWriterTest {
	@Autowired
    private RestHighLevelClient restHighLevelClient;

   //创建索引库
    @Test
    public void testCreateIndex() throws IOException {
        //创建“创建索引请求”对象,并设置索引名称
        CreateIndexRequest createIndexRequest = new CreateIndexRequest("java06");
        //设置索引参数
        createIndexRequest.settings("{\n" +
                "       \"number_of_shards\" : 2,\n" +
                "       \"number_of_replicas\" : 0\n" +
                "  }", XContentType.JSON);
        createIndexRequest.mapping("course", "{\r\n" + 
        		"  \"_source\": {\r\n" + 
        		"    \"excludes\":[\"description\"]\r\n" + 
        		"  }, \r\n" + 
        		" 	\"properties\": {\r\n" + 
        		"           \"name\": {\r\n" + 
        		"              \"type\": \"text\",\r\n" + 
        		"              \"analyzer\":\"ik_max_word\",\r\n" + 
        		"              \"search_analyzer\":\"ik_smart\"\r\n" + 
        		"           },\r\n" + 
        		"           \"description\": {\r\n" + 
        		"              \"type\": \"text\",\r\n" + 
        		"              \"analyzer\":\"ik_max_word\",\r\n" + 
        		"              \"search_analyzer\":\"ik_smart\"\r\n" + 
        		"           },\r\n" + 
        		"           \"studymodel\": {\r\n" + 
        		"              \"type\": \"keyword\"\r\n" + 
        		"           },\r\n" + 
        		"           \"price\": {\r\n" + 
        		"              \"type\": \"float\"\r\n" + 
        		"           },\r\n" + 
        		"  }\r\n" + 
        		"}", XContentType.JSON);
        //创建索引操作客户端
        IndicesClient indices = restHighLevelClient.indices();

        //创建响应对象
        CreateIndexResponse createIndexResponse = 
            indices.create(createIndexRequest);
        //得到响应结果
        boolean acknowledged = createIndexResponse.isAcknowledged();
        System.out.println(acknowledged);
    } 
  }

8.3.2. Delete the index library

8.3.2.1.api

 DELETE /java06

8.3.2.2.java client

	//删除索引库
	@Test
	public void testDeleteIndex() throws IOException {
		//创建“删除索引请求”对象
		DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("java06");
		//创建索引操作客户端
		IndicesClient indices = restHighLevelClient.indices();
		//创建响应对象
		DeleteIndexResponse deleteIndexResponse = 
            indices.delete(deleteIndexRequest);
		//得到响应结果
		boolean acknowledged = deleteIndexResponse.isAcknowledged();
		System.out.println(acknowledged);
	}

8.3.2. Adding documents

8.3.2.1.api

 POST /java06/course/1
{  "name":"spring cloud actual combat",  "description":"This course is mainly explained in four chapters: 1. Introduction to microservice architecture 2. Introduction to spring cloud basics 3. Actual Spring Boot 4. Registration center eureka.",  "studymodel": "201001",  "price":5.6 }




8.3.2.2.java client

	//添加文档
	@Test
	public void testAddDocument() throws IOException {
		//创建“索引请求”对象:索引当动词
		IndexRequest indexRequest = new IndexRequest("java06", "course", "1");
		indexRequest.source("{\n" +
				" \"name\":\"spring cloud实战\",\n" +
				" \"description\":\"本课程主要从四个章节进行讲解: 1.微服务架构入门 " +
				"2.spring cloud 基础入门 3.实战Spring Boot 4.注册中心nacos。\",\n" +
				" \"studymodel\":\"201001\",\n" +
				" \"price\":5.6\n" +
				"}", XContentType.JSON);
		IndexResponse indexResponse = 
            restHighLevelClient.index(indexRequest);
		System.out.println(indexResponse.toString());
	}

8.3.3. Batch add documents

Supports operations on different indexes in one API call. Four types of operations are supported: index, create, update, delete.

  • grammar:

POST /_bulk
{ action: { metadata }} 
{ requestbody }\n
{ action: { metadata }} 
{ requestbody }\n
... 

8.3.3.1.api

POST /_bulk
{"index":{"_index":"java06","_type":"course"}}
{"name":"php实战","description":"php谁都不服","studymodel":"201001","price":"5.6"}
{"index":{"_index":"java06","_type":"course"}}
{"name":"net实战","description":"net从入门到放弃","studymodel":"201001","price":"7.6"}

8.3.3.2.java client

@Test
public void testBulkAddDocument() throws IOException {
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.add(new IndexRequest("java06", "course").source("{...}",
                                                                  XContentType.JSON));
    bulkRequest.add(new IndexRequest("java06", "course").source("{...}",
                                                                  XContentType.JSON));
    BulkResponse bulkResponse = 
                   restHighLevelClient.bulk(bulkRequest);
    System.out.println(bulkResponse.hasFailures());

8.3.4. Modifying documents

8.3.4.1.api

PUT /java06/course/1
{
 "price":66.6

8.3.4.2.java client

//更新文档
@Test
public void testUpdateDocument() throws IOException {
    UpdateRequest updateRequest = new UpdateRequest("java06", "course", "1");
    updateRequest.doc("{\n" +
            "  \"price\":7.6\n" +
            "}", XContentType.JSON);
    UpdateResponse updateResponse = 
                   restHighLevelClient.update(updateRequest);
    System.out.println(updateResponse.getResult());
}

8.3.5. Delete document

8.3.5.1.api

 DELETE /java06/coures/1

8.3.4.2.java client

    //根据id删除文档
    @Test
    public void testDelDocument() throws IOException {
        //删除请求对象
        DeleteRequest deleteRequest = new DeleteRequest("java06","course","1");
        //响应对象
        DeleteResponse deleteResponse = 
            restHighLevelClient.delete(deleteRequest);
        System.out.println(deleteResponse.getResult());
    }

8.4. Document Search

8.4.1. Prepare the environment

Insert the following data into the index repository:

PUT /java06/course/1
{   "name": "Bootstrap development",   "description": "Bootstrap is a front-end page development css framework launched by Twitter. It is a very popular development framework. This framework integrates a variety of pages Effect. This development framework contains a lot of CSS and JS program codes, which can help developers (especially programmers who are not good at css page development) easily realize a css, beautiful interface css effect that is not limited by browsers.",   "studymodel": "201002",   "price":38.6,   "pic": "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg" }





PUT /java06/course/2
{   "name": "Java Programming Basics",   "description": "Java language is the world's number one programming language and has the largest number of users in the field of software development.",   "studymodel": "201001",   "price":68.6,   "pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg" }





PUT /java06/course/3
{   "name": "spring development foundation",   "description": "spring is very popular in the java field, and java programmers are using it.",   "studymodel": "201001",   "price" :88.6,   "pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg" }





8.4.2. Simple search

Simple search is to query through url and request ES in get mode. grammar:

GET /index_name/type_name/doc_id GET [/index_name/type_name/]_search[?parameter_name=parameter_value&...]

For example:

GET /java06/course/_search?q=name:spring&sort=price:desc

Note: If the query conditions are complex, it is difficult to build search conditions, and it is rarely used in a production environment. For example: the search condition is required to include mobile phones in the product name, the price is between 1000 and 5000, and the sales volume is more than 500 per month, sort according to the ascending order of the price, query the second page by page, 40 data per page: ?q=xxxx:xxx&range= xxx:xxx:xxx&aggs&sort&from&size

8.4.2.1.api

GET /java06/course/1

8.4.2.2.java client

  //查询文档
    @Test
    public void getDoc() throws IOException {
        GetRequest getRequest = new GetRequest("java06","course","1");
        GetResponse getResponse = restHighLevelClient.get(getRequest);
        boolean exists = getResponse.isExists();
        System.out.println(exists);
        String source = getResponse.getSourceAsString();
        System.out.println(source);
    }

8.4.3.DSL search

DSL (Domain Specific Language) is a json-based search method proposed by ES. When searching, pass in specific data in json format to fulfill different search requirements. DSL is more powerful than URI search method. It is recommended to use DSL method in the project. Complete the search. grammar:

GET /index_name/type_name/_search ​

{ ​

        "commond":{ ​

                "parameter_name" : "parameter_value" ​

        } ​

}

8.4.3.1.match_all query

8.4.3.1.1.api

GET /java06/course/_search

{  

        "query" : {  

                 "match_all" : {}

        }

}

8.4.3.1.2.java client

package com.bjpowernode.test;

import com.bjpowernode.ElasticsearchApp;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;

import java.io.IOException;

@RunWith(SpringJUnit4ClassRunner.class)
@SpringBootTest(classes = {ElasticsearchApp.class})
public class IndexReaderTest {
    @Autowired
    private RestHighLevelClient restHighLevelClient;
    private SearchRequest searchRequest;
    private SearchResponse searchResponse;

    @Before
    public void init(){
        searchRequest = new SearchRequest();
        searchRequest.indices("java06");
        searchRequest.types("course");
    }

    @Test
    public void testMatchAll() throws IOException {
        //2、创建 search请求对象
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices("java06");
        searchRequest.types("course");

        //3、创建 参数构造器
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(QueryBuilders.matchAllQuery());

        //4、设置请求参数
        searchRequest.source(searchSourceBuilder);

        //1、调用search方法
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest);

        SearchHits searchHits = searchResponse.getHits();
        
        long totalHits = searchHits.getTotalHits();
        System.out.println("共搜索到"+totalHits+"条文档");

        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits) {
            System.out.println(hit.getSourceAsString());
        }
    }

    @After
    public void show(){
        SearchHits searchHits = searchResponse.getHits();
        long totalHits = searchHits.getTotalHits();
        System.out.println("共搜索到"+totalHits+"条文档");

        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits) {
            System.out.println(hit.getSourceAsString());
        }
    }
}

8.3.3.2. Pagination query

8.3.3.2.1.api

 GET /java06/course/_search
{   "query" : { "match_all" : {} },   "from" : 1, # Start querying from which piece of data, counting from 0   "size" : 3, # How much data to query   "sort" : [     { "price" : "asc" }   ] }






8.3.3.2.2.java client

//分页查询
@Test
public void testSearchPage() throws Exception {
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchAllQuery());
    searchSourceBuilder.from(1);
    searchSourceBuilder.size(5);
    searchSourceBuilder.sort("price", SortOrder.ASC);

    // 设置搜索源
    searchRequest.source(searchSourceBuilder);
    // 执行搜索
    searchResponse = restHighLevelClient.search(searchRequest);
}

8.3.3.4.match query

match Query is a full-text search. Its search method is to first segment the search string into words, and then use each entry to search from the index.

8.3.3.4.1.api

query: The search keyword operator: or indicates that the condition is met as long as one word appears in the document, and indicates that the condition is met only if each word appears in the document.

1. Basic use:

 GET /java06/course/_search
{
  "query" : {
    "match" : {
      "name": {
        "query": "spring开发"
      }
    }
  }
}

2、operator:

GET /java06/course/_search
{
  "query" : {
    "match" : {
      "name": {
        "query": "spring开发",
        "operator": "and"
      }
    }
  }

The execution process of the above search is: 1. Segment "spring development" into two words, spring and development. 2. Then use the two words spring and development to match the search in the index. 3. Since the operator is set to and, the document must be returned only when two words are successfully matched.

7.3.3.4.2 java client

@Test
public void testMatchQuery() throws Exception {
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchQuery("name", "spring开
                                                       发").operator(Operator.AND));
		
    // 设置搜索源
    searchRequest.source(searchSourceBuilder);
    // 执行搜索
    searchResponse = restHighLevelClient.search(searchRequest);
 }

8.3.3.5. multi_match query

matchQuery is to match in one field, and multiQuery is to use keywords to match in multiple Fields.

8.3.3.5.1.api

1. Basic usage example: keyword "development" to match name and description fields

 GET /java06/course/_search
{
  "query": {
    "multi_match": {
      "query": "开发",
      "fields": ["name","description"]
    }
  }
}

Note: This search operation is suitable for building complex query conditions and is commonly used in production environments.

8.3.3.5.2.java client

@Test
public void testMultiMatchQuery() throws Exception {
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.multiMatchQuery("开发","name","description"));
		
    // 设置搜索源
    searchRequest.source(searchSourceBuilder);
    // 执行搜索
    searchResponse = restHighLevelClient.search(searchRequest);
}

8.3.3.6. bool query

Boolean query corresponds to Lucene's BooleanQuery query, which realizes the combination of multiple queries. Parameters: must: means must, and multiple query conditions must be met. (Usually use must) should: Indicates or, as long as one of multiple query conditions is satisfied. must_not: means not.

8.3.3.6.1.api

For example: query for documents whose name includes "development" and whose price range is 1-100

 GET /java06/course/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "开发"
          }
        },
        {
          "range": {
            "price": {
              "gte": 50,
              "lte": 100
            }
          }
        }
      ]
    }
  }
}

7.3.3.6.2.java client


    @Test
    public void testBooleanMatch() throws IOException {
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        //json条件
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.must(QueryBuilders.matchQuery("name","开发"));
        boolQueryBuilder.must(QueryBuilders.rangeQuery("price").gte("50").lte(100));
        searchSourceBuilder.query(boolQueryBuilder);

        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest);
    }

8.3.3.7. filter query

Filter queries. This operation is actually a complementary syntax to the query DSL. When filtering, no matching score calculation is performed. Compared with query, filter is relatively efficient. Query to calculate search match relevance score. Query is more suitable for complex conditional searches.

8.3.3.7.1.api

For example: use bool query to search for data containing "development" in the name, and the price is between 10 and 100. 1. Without using filter, the name and price need to calculate the correlation score:

 GET /java06/course/_search
{   "query": {      "bool" : {         "must":[             {                "match": {                  "name": "development"                }             },             {               "range": {# range of fields The data must meet a certain range to have results.                 "price": {                   "gte": 10, # comparison symbol lt gt lte gte                   "lte": 100                 }               }             }         ]      } }   }



















 2. Using filter, price does not need to calculate the correlation score:

GET /java06/course/_search
{   "query": {     "bool": {       "must": [         {           "match": {             "name": "development"           }         }       ],       "filter": {# filter, in the Some search results are filtered, and those that meet the conditions are returned.         "range": {           "price": {             "gte": 1,             "lte": 100           }         }       }     }   } }



















8.3.3.7.2.java client

@Test
public void testFilterQuery() throws IOException {
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    boolQueryBuilder.must(QueryBuilders.matchQuery("name","开发"));
    boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(10).lte(100))
    searchSourceBuilder.query(boolQueryBuilder);
    searchRequest.source(searchSourceBuilder);
    searchResponse = restHighLevelClient.search(searchRequest);
}

8.3.3.8.highlight query

Highlighting: Highlighting is not a search condition, but a display logic. When searching, it is often necessary to highlight the search keywords.

8.3.3.8.1.api

For example:

 GET /java06/course/_search
{
  "query": {
    "match": {
      "name": "开发"
    }
  },
  "highlight": {
      "pre_tags": ["<font color='red'>"],
      "post_tags": ["</font>"],
      "fields": {"name": {}}
  }
}

8.3.3.8.2.java clent

1. Query:

  @Test
  public void testHighLightQuery() throws Exception {
      SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
      searchSourceBuilder.query(QueryBuilders.matchQuery("name", "spring"));
      //设置高亮
      HighlightBuilder highlightBuilder = new HighlightBuilder();
      highlightBuilder.preTags("<font color='red'>");
      highlightBuilder.postTags("</font>");
      highlightBuilder.fields().add(new HighlightBuilder.Field("name"));
      searchSourceBuilder.highlighter(highlightBuilder);

      searchRequest.source(searchSourceBuilder);
      searchResponse = restHighLevelClient.search(searchRequest);
}

2. Traverse:  

 @After
public void displayDoc() {
    SearchHits searchHits = searchResponse.getHits();
    long totalHits = searchHits.getTotalHits();
    System.out.println("共搜索到" + totalHits + "条文档");

    SearchHit[] hits = searchHits.getHits();
    for (int i = 0; i < hits.length; i++) {
        SearchHit hit = hits[i];
        String id = hit.getId();
        System.out.println("id:" + id);
        String source = hit.getSourceAsString();
        System.out.println(source);

        Map<String, HighlightField> highlightFields = hit.getHighlightFields();
        if (highlightFields != null) {
            HighlightField highlightField = highlightFields.get("name");
            Text[] fragments = highlightField.getFragments();
            System.out.println("高亮字段:" + fragments[0].toString());
        }
    }

}

Nine, cluster management 

ES usually works in a cluster mode, which can not only improve the search ability of ES but also the ability to handle big data search, and also increase the system's

Fault tolerance and high availability.

The following figure is a schematic diagram of the ES cluster structure:

The setting here is: each primary shard has two copies. If a node is down, we are not afraid, for example, if node 1 is down, we can query the copy 0 located on node 3 and node 3

 

Add document process:

(1) Suppose the user sends the request to node 1

(2) The system knows through the remainder algorithm that this 'document' should belong to the primary shard 2, so the request is forwarded to the node 3 that saves the primary shard

(3) The system saves the document in the primary shard 2 of node 3, and then forwards the request to the other two nodes that keep copies.

 

Query document process:

(1) The request is sent to node 1

(2) Node 1 calculates that the data belongs to primary shard 2. At this time, there are three choices, namely copy 2 of node 1, copy 2 of node 2, and node 3

The primary shard 2 of , assuming that node 1 is load-balanced, adopts the polling method, selects node 2, and forwards the request.

(3) Node 2 returns the data to Node 1, and Node 1 finally returns it to the client.

9.2. Create node 2

1. Copy node elasticsearch-1

 

2. Modify the content of elasticsearch.yml as follows:

node.name: power_shop_node_2
discovery.zen.ping.unicast.hosts: ["192.168.204.132:9300", "192.168.204.133:9300"] 

3. Delete the data directory of node 2

9.3. View cluster health status

1. Query the health information of the current cluster:

 GET /_cluster/health

2. Results:

 {
  "cluster_name": "power_shop",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 2,
  "number_of_data_nodes": 2,
  "active_primary_shards": 2,
  "active_shards": 4,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100
}

status : Use three colors to display health status

Green: each primary shard and replica shard of the index library is active

yellow: each primary shard of the index library is active, but some replica shards are not active, such as single node creation

backup allocation

red: Not all primary shards are active.

9.4. Testing

1. Start two nodes, test cluster health and fragmentation

 

2. Shut down node 2 and test the cluster status

 

3. Create a backup allocation, shut down node 2, and then test the cluster status  

 

 

Guess you like

Origin blog.csdn.net/m0_71560190/article/details/126554795