Learn the advanced use of ElasticSearch in Grain Mall and project integration ES

1. Advanced operation of ElasticSearch

1.1 Query operation with query conditions

During the query process, we may need to carry some query conditions in the request path, and hope to query the desired documents with the query conditions. Therefore, next, we will introduce some two types of queries that carry query conditions in ElasticSearch operate. The basic syntax for query operations is

 
 

json

copy code

GET /{索引名}/_search(?查询条件)

1.1.1 Carry query conditions in the request path

We want to query all the data and sort them in ascending order according to the value of the account_number field, so we can use the following command

 
 

json

copy code

GET /bank/_search?q=*&sort=account_number:asc

Let's analyze the meaning of this command

GET: Indicates the GET request sent, used for query bank: which index to query _search: is the default index q=*: indicates that the content of the query is all, q is the abbreviation of query sort=account_number:asc: specifies which field to sort in ascending/descending order

The result of executing this instruction is: It is a bit long to display the queried data here, so some field values ​​are omitted here, and only some of them are listed

 
 

json

copy code

{ ... "max_score" : null, // 最匹配的文档的分数 "hits" : [ // 存放匹配的文档 { "_index" : "bank", // 索引 "_type" : "account", // 类型 "_id" : "0", // 文档的id "_score" : null, // 该条文档的相关性分数 "_source" : { // 这条文档相关的字段和其对应的值 "account_number" : 0, "balance" : 16623, "firstname" : "Bradshaw", "lastname" : "Mckenzie", "age" : 29, "gender" : "F", "address" : "244 Columbus Place", "employer" : "Euron", "email" : "[email protected]", "city" : "Hobucken", "state" : "CO" }, "sort" : [ // 排序值 0 ] }, { ... "_id" : "1", "_score" : null, "_source" : { "account_number" : 1, ... }, "sort" : [ 1 ] }, { ... "_id" : "2", "_score" : null, "_source" : { "account_number" : 2, ... }, "sort" : [ 2 ] }, { ... "_id" : "3", "_score" : null, "_source" : { "account_number" : 3, ... }, "sort" : [ 3 ] }, { ... "_id" : "4", "_score" : null, "_source" : { "account_number" : 4, ... }, "sort" : [ 4 ] }, { ... "_id" : "5", "_score" : null, "_source" : { "account_number" : 5, "... }, "sort" : [ 5 ] }, { ... "_id" : "6", "_score" : null, "_source" : { "account_number" : 6, ... }, "sort" : [ 6 ] }, { ... "_id" : "7", "_score" : null, "_source" : { "account_number" : 7, ... }, "sort" : [ 7 ] }, { ... "_id" : "8", "_score" : null, "_source" : { "account_number" : 8, ... }, "sort" : [ 8 ] }, { ... "_id" : "9", "_score" : null, "_source" : { "account_number" : 9, ... }, "sort" : [ 9 ] } ] } }

This method is certainly possible, but when we have many query conditions, the entire request path will be a bit long, and it is not easy to check. Therefore, ES officially recommends that we use another method, using Query DSL.

1.1.2 Query DSL way

Official website documentation: Query and filter context | Elasticsearch Reference [6.0] | Elastic official website entry documentation: Executing Searches | Elasticsearch Reference [6.0] | Elastic You can read the official documentation, here is just a brief introduction.

 
 

json

copy code

GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": { "order": "asc" } } ] }

To paraphrase what is written above:

GET /bank/_search: Indicates that you want to banksend a query request to China, and the specific query conditions are located in the curly brackets behind query: Indicates the specific conditions of the query, which is an object, and all query conditions can be defined here match_all: match all, query all sort: indicate sorting, need to specify the query Which field is to be sorted in which way in the output document account_number: it means that account_numberthe field is operated in a certain way order:asc: to account_numbersort, the sorting method is ascending

Therefore, this writing method expresses the same meaning as the previous writing, but it is written differently. It should be noted that the curly braces of the query conditions should be placed on a new line, otherwise an error will be reported, even if the following writing method is not acceptable, because the meanings expressed are the same, and the results must be the same, so the query results will not be posted here up. Of course, the above-mentioned writing method can also be abbreviated, and the abbreviated writing method is as follows:

 
 

json

copy code

GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" }] }

Moreover, we can notice that in the above three query operations, 10 pieces of data are finally returned, and at the top of the query results, there are the following parameter values, among which represent the number of currently detected data pieces, a total valueof It is 1000 records, and only the first 10 records are displayed here. This is because ES will automatically perform paging operations for us, and the default is to display 10 records of the first page. So we can only see 10 pieces of data, of course we can also customize the related pagination

 
 

json

copy code

"total" : { "value" : 1000, "relation" : "eq" },

1.2 Return only part of the field value

Earlier we used Query DSL to query all field values, but sometimes we don’t necessarily need all field values, only some field values, and ES also provides corresponding APIs. firstnameAssuming that we only need to find out the two field values ​​of and in the data table now lastname, then we can use it _source. In this parameter, just write the name of the field you want to query. All fields are not written to be queried by default. The specific writing method is as follows:

 
 

json

copy code

GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": { "order": "desc" } } ], "_source": ["firstname","lastname"] }

At this time, the results of the query are as follows: Similarly, some query results are omitted

 
 

json

copy code

{ ... "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "0", "_score" : null, "_source" : { "firstname" : "Bradshaw", "lastname" : "Mckenzie" }, "sort" : [ 0 ] }, ...省略9条数据 ] } }

_sourceYou can see that there are only two field values ​​left at this time .

1.3 match usage

match will retrieve matching documents for the specified field names and field values. There are two types of search using match, one is full-text search and the other is precise search. How to judge whether it is a full-text search or a precise search is not explained here, but _mappingwill be introduced later. For full-text search:

 
 

json

copy code

GET /bank/_search { "query": { "match": { "address": "Avenue" } } }

addressThe data returned at this time is: Only part of the data is displayed here, and all the documents contained in are displayed here Avenue. At this time, there are a total of 214 documents addressincluded in the document Avenue.

 
 

json

copy code

"total" : { "value" : 214, "relation" : "eq" }, "max_score" : 1.5400246, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "25", "_score" : 1.5400246, "_source" : { ... "address" : "171 Putnam Avenue", ... } }, { ... "_score" : 1.5400246, "_source" : { ... "address" : "759 Newkirk Avenue", ... } }, ...

In case of exact search:

 
 

json

copy code

GET /bank/_search { "query": { "match": { "age": "20" } } }

The data returned at this time is:

 
 

json

copy code

"total" : { "value" : 44, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { ... "_score" : 1.0, "_source" : { ... "age" : 20, ... } }, { ... "_score" : 1.0, "_source" : { ... "age" : 20, ... } },

From the above two search methods, we can see that for full-text search, they will be sorted according to the relevance score from large to small, and the more matching the score, the higher the priority will be displayed. For exact retrieval, as long as it is a matching document, its relevance score is 1.0, which is fixed.

1.4 match_phrase phrase matching

Phrase matching means that only when the same phrase appears and the sequence is consistent, it will match, otherwise it will not match, which is different from the fuzzy matching of full-text search. The following example is the document addresscontained in the query attributeNewkirk Avenue

 
 

json

copy code

GET /bank/_search { "query": { "match_phrase": { "address": "Newkirk Avenue" } } }

The data returned at this time is:

 
 

json

copy code

"max_score" : 7.5308537, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "102", "_score" : 7.5308537, "_source" : { ... "address" : "759 Newkirk Avenue", ... } } ]

At this time, only one document matches. It is obvious that the number of documents at this time is much smaller, which is different from the number of documents found when using match before, indicating that this is different from the fuzzy matching of full-text search.

1.5 Multi-field matching

In the query process, we don't just want to match one field, but documents that match multiple fields together. So you need to use another parameter, multi_match. multi_matchMultiple fields are supported for matching.

 
 

json

copy code

GET /bank/_search { "query": { "multi_match": { "query": "Newkirk Choctaw", "fields": ["address","city"] } } }

Explain multi_matchthe two parameters used

query: The value to be matched : the field name that needs to be matched fieldswith the value set inquery

According to the match syntax, it is address:Newkirk, address:Choctaw, city:Newkirkand city:Choctaw. The data returned at this time is:

 
 

json

copy code

"total" : { "value" : 3, "relation" : "eq" }, "max_score" : 6.505949, "hits" : [ { ... "_score" : 6.505949, "_source" : { ... "address" : "759 Newkirk Avenue", ... "city" : "Choctaw", "state" : "NJ" } }, { ... "_score" : 6.505949, "_source" : { ... "address" : "803 Glenmore Avenue", ... "city" : "Newkirk", "state" : "KS" } }, { ... "_score" : 5.9908285, "_source" : { ... "address" : "865 Newkirk Placez", ... "city" : "Ada", "state" : "ID" } } ]

At this time, there are only three pieces of data that match, and these three pieces of data either contain one or more words in address, or contain one or more words in city, or contain both. A bit similar to MySQL's or query.

1.6 bool compound query

The bool compound query is a query structure that combines multiple query conditions. There are four major structures, namely: must, must_not, should, filter.

  • must: Required, that is, the condition that must be included in the document, providing a relevance score
  • must_not: must not, that is, the condition that must not exist in the document, does not provide a relevance score
  • should: should, i.e. conditions that may or may not be present in the document, provide a relevance score when matched
  • filter: filter, that is, the referent clause (query) must appear in the matching document, and no relevance score is provided. Query an address must contain Avenue, the city does not contain Choctaw, and the gender priority is F
 
 

json

copy code

GET /bank/_search { "query": { "bool": { "must": [ { "match": { "address": "Avenue" } } ], "must_not": [ { "match": { "city": "Choctaw" } } ], "should": [ { "match": { "gender": "F" } } ] } } }

The data returned at this time 必须包含Avenue,所在城市不包含Choctaw且性别优先为Fwill not be posted here, and the space will be a bit long. Find documents whose age is between 10-20

 
 

json

copy code

GET /bank/_search { "query": { "bool": { "filter": { "range": { "age": { "gte": 10, "lte": 20 } } } } } }

1.7 term

term can do a precise search, it will take the entire value to search, the word will not be analyzed during the search, and the case and content must be exactly the same to match. term can search fields of full-text search type and precise search type. Here are two examples to demonstrate

1.7.1 Fields of Full Text Search Type

For fields of the full-text search type, if you use term to search

 
 

json

copy code

GET /bank/_search { "query": { "term": { "lastname": { "value": "Terry" } } } }

At this point, the returned data is:

 
 

json

copy code

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : [ ] } }

It can be found that no documents are found at this time, because when we insert the document, we will use ES's built-in word analyzer to split the word, and when splitting it will lowercase it.

 
 

json

copy code

GET /bank/_analyze { "analyzer": "standard", "text": ["Terry"] }

Here is a brief explanation, analyzerwhich means a word analyzer, and the standard used here textis the word to be analyzed. The result of Terryanalyzing this word is as follows:

 
 

json

copy code

{ "tokens" : [ { "token" : "terry", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 0 } ] }

It can be seen that when word analysis is performed, it will be analyzed as stored terryin the inverted index table when it is finally stored terry, and term requires that the word case and content are exactly the same to match, so here Match failed.

1.7.2 Fields for Exact Search

For fields that are accurately retrieved, using term for retrieval is no different from match

 
 

json

copy code

GET /bank/_search { "query": { "term": { "age": { "value": "20" } } } }

The data found at this time are all age 20, so the data will not be listed here.

1.8 keyword

keywordIt is an attribute that supports precise retrieval. For some full-text search fields, we can add a sub-property to this field keyword. When we want to perform precise retrieval, we can get this field keywordand use keyword to Perform a precise search. Next, let's experience the difference between using keywordand not using in the same situation keyword. Let me explain in advance that addressit is a full-text search field, and there is also a sub-attribute inside it keyword. We can use it _mappingto view this field, and we will find out this The case of indexing all fields

 
 

json

copy code

GET /bank/_mapping

We only look at addressthe field part

 
 

json

copy code

"address" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } },

As you can see, in keywordthis sub-property

1.8.1 Do not use keywords

 
 

json

copy code

GET /bank/_search { "query": { "match": { "address": "Gunnison" } } }

The data returned at this time is:

 
 

json

copy code

{ "_index" : "bank", "_type" : "account", "_id" : "157", "_score" : 6.501515, "_source" : { "account_number" : 157, "balance" : 39868, "firstname" : "Claudia", "lastname" : "Terry", "age" : 20, "gender" : "F", "address" : "132 Gunnison Court", "employer" : "Lumbrex", "email" : "[email protected]", "city" : "Castleton", "state" : "MD" } }

It can be seen that a piece of data can be found

1.8.2 Using keywords

 
 

json

copy code

GET /bank/_search { "query": { "match": { "address.keyword": "Gunnison" } } }

The data returned at this time is:

 
 

json

copy code

"hits" : [ ]

At this time, no data is found at all, because at this time, the entire Gunnison will be used for precise retrieval. As mentioned earlier, for precise retrieval, the value of the current field needs to be exactly equal to this value to match successfully. However, there is no such document here, so the retrieval fails.

1.9 Aggregation analysis

Aggregations | Elasticsearch Guide [7.17] | Elastic aggregation analysis means that we want to do some processing on the results of the previous retrieval when searching, such as counting the average value, how many people are in each age group, etc., we can Do it using aggregate analysis. Next, we will directly use three cases to introduce you to aggregation analysis

1.9.1 Search the age distribution and average age of all people whose address contains mill

The DSL we use is as follows:

 
 

json

copy code

GET /bank/_search { "query": { "match": { "address": "mill" } }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 10 } }, "ageAvg": { "avg": { "field": "age" } } } }

  • The query is to first find out all the people whose address contains mill
  • The latter aggs is to aggregate and analyze the results detected by the previous query
  • ageAgg is the name of the current aggregation, which can be arbitrary
  • terms is to divide the results of the previous query in a certain way. Here, term is selected, and the corresponding field attribute is age, so the query results are searched and divided by terms according to age
  • The following ageAvg is to calculate the average age of the results found in the previous query according to the age. Two aggregations are used here, one is and the other is. ageAggThese ageAvgtwo aggregations are two names we customized, and the specifics need to be carried out Which aggregation operation can be written below. If ageAggaggregation based on age is used, ageAvgthe average value is calculated according to age. The result returned here is:
 
 

json

copy code

{ "hits" : { "max_score" : 5.4032025, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "970", "_score" : 5.4032025, "_source" : { ... "age" : 28, ... } }, { "_index" : "bank", "_type" : "account", "_id" : "136", "_score" : 5.4032025, "_source" : { ... "age" : 38, ... } }, { "_index" : "bank", "_type" : "account", "_id" : "345", "_score" : 5.4032025, "_source" : { ... "age" : 38, ... } }, { "_index" : "bank", "_type" : "account", "_id" : "472", "_score" : 5.4032025, "_source" : { ... "age" : 32, ... } } ] }, "aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 38, "doc_count" : 2 }, { "key" : 28, "doc_count" : 1 }, { "key" : 32, "doc_count" : 1 } ] }, "ageAvg" : { "value" : 34.0 } } }

aggregationsPart of it is the result obtained after we use the aggregation analysis, which will be divided according to each aggregation.

1.9.2 Aggregate by age, and find the average salary of these people in these age groups

The query and ageAgg are the same as before, and our balanceAvg, here is the average salary of the required age group, so it is aggregated based on age aggregation, so we need to aggregate again inside ageAgg. This is not the same as ageAgg and ageAvg in the previous case.

 
 

json

copy code

GET /bank/_search { "query": { "match_all": {} }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 10 }, "aggs": { "balanceAvg": { "avg": { "field": "balance" } } } } } }

The data returned at this time is as follows:

 
 

json

copy code

"aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 463, "buckets" : [ { // 这两个是由ageAgg聚合分析出来的结果 "key" : 31, "doc_count" : 61, // 这个则是由balanceAvg查出来的结果 "balanceAvg" : { "value" : 28312.918032786885 } }

As you can see, for aggregation, we can be parallel or nested.

1.9.3 Find all age distributions, and the average salary of M and the average salary of F in these age groups and the overall average salary of this age group

 
 

json

copy code

GET /bank/_search { "query": { "match_all": {} }, "aggs": { // 先按照年龄聚合 "ageAgg": { "terms": { "field": "age", "size": 10 }, // 再将年龄聚合后的结果再次进行聚合 "aggs": { "gender": { "terms": { "field": "gender.keyword", "size": 10 }, "aggs": { "balanceAvg": { "avg": { "field": "balance" } } } }, "totalBalanceAvg": { "avg": { "field": "balance" } } } } } }

It can be seen that the writing method this time will be more complicated than the previous two, mainly because the multi-level nesting of aggregation is used. To sum up, the above are actually three usages of aggregation analysis, juxtaposition, nesting, and nesting within nesting.

2. Mapping of ElasticSearch

Official document: Mapping | Elasticsearch Guide [7.4] | Elastic mapping refers to defining how documents and the fields they contain are stored and indexed.

2.1 Query Mapping

Earlier we used mapping when using keywords, that is

 
 

json

copy code

GET /bank/_mapping

This is also the most basic usage, to get the mapping rules of all fields of an index.

2.1.1 New mapping

New mapping refers to creating a new index, adding some fields to this index, and setting mapping rules for these fields. Its basic syntax is:

 
 

json

copy code

PUT /{index_name} { "mappings": { "properties": { "{fieldname}": { "type": "{type}" } } } }

{index_name}Fill in our own index name in , fill {filedname}in the field name we want to declare in , and {type}fill in the type of the corresponding field in {}. There are three types of fields:

 
 

json

copy code

PUT /my_index { "mappings": { "properties": { "age": { "type": "integer" }, "email": { "type": "keyword" }, "name": { "type": "text" } } } }

At this point, we can see the mapping rules of the currently created new index by querying the mapping

 
 

json

copy code

{ "my_index" : { "mappings" : { "properties" : { "age" : { "type" : "integer" }, "email" : { "type" : "keyword" }, "name" : { "type" : "text" } } } } }

2.3 Add a new attribute to the map

2.3.1 Incorrect spelling

If we want to add a new attribute to an existing mapping, we may first think of adding fields to the original new mapping and re-executing it as when adding a document. If we want to add a employee_idfield, then it will be the following command:

 
 

json

copy code

PUT /my_index { "mappings": { "properties": { "age": { "type": "integer" }, "email": { "type": "keyword" }, "name": { "type": "text" }, "emplyee_id": { "type": "long" } } } }

The data returned after execution is as follows:

 
 

json

copy code

{ "error": { "root_cause": [ { "type": "resource_already_exists_exception", "reason": "index [my_index/e2ILrRUVQkmL0SzQxZp__g] already exists", "index_uuid": "e2ILrRUVQkmL0SzQxZp__g", "index": "my_index" } ], "type": "resource_already_exists_exception", "reason": "index [my_index/e2ILrRUVQkmL0SzQxZp__g] already exists", "index_uuid": "e2ILrRUVQkmL0SzQxZp__g", "index": "my_index" }, "status": 400 }

An error occurred. The reason for the error is that the current index already exists, so it is not possible. That is to say, every time an index is created, it will first judge whether the index is currently included, if not, the creation is successful, and if it is, the creation fails. Therefore, we cannot use this method.

2.3.2 Correct spelling

 
 

json

copy code

PUT /my_index/_mapping { "properties": { "employee_id": { "type": "keyword", "index": false } } }

indexThe field defaults to true, which means that the index is turned on. Turning on the index means that it can be retrieved. As long as the index of a certain attribute is turned on, then we can find some documents through this field during the search process. If the index is turned off, Then it is equivalent to the existence of this attribute, but the corresponding document record cannot be found through this attribute, that is, the redundant attribute. Requery mapping rules

 
 

json

copy code

{ "my_index" : { "mappings" : { "properties" : { "age" : { "type" : "integer" }, "email" : { "type" : "keyword" }, "employee_id" : { "type" : "keyword", "index" : false }, "name" : { "type" : "text" } } } } }

As you can see, the addition was successful at this time.

2.4 Update mapping and data migration

In the actual development process, we may feel that the mapping rules of a certain field are inappropriate and want to modify them, but this is not possible. Because the attributes of the current mapping rules are associated with a lot of data, if we modify them directly, those indexes created before may become invalid, and the previous data will not be automatically modified together because we modify the mapping rules. So all we can do is to re-create an index and copy the entire data over.

2.4.1 Get all mapping rules of the current index

 
 

json

copy code

GET /bank/_mapping

2.4.2 Copy and modify the original mapping rules, and put them into the new index

 
 

json

copy code

PUT /newbank { "mappings": { "properties" : { "account_number" : { "type" : "long" }, "address" : { "type" : "text" }, "age" : { "type" : "integer" }, "balance" : { "type" : "long" }, "city" : { "type" : "keyword" }, "email" : { "type" : "keyword" }, "employer" : { "type" : "keyword" }, "firstname" : { "type" : "text" }, "gender" : { "type" : "keyword" }, "lastname" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "state" : { "type" : "keyword" } } } }

2.4.3 Migrating old index data to new index

 
 

json

copy code

// 发送重新索引请求 POST _reindex { "source": { "index": "bank", "type": "account" }, "dest": { "index": "newbank" } }

In this way, the modification of the mapping rules and the migration of data are completed. We can query the document of this new index, and we can find that the content inside is the same as the previous one.

3. Word segmentation operation of ElasticSearch

3.1 Introduction to Word Segmentation of ES

The word segmentation operation is to analyze a sentence and split it into several phrases or words. We also used this before, that is, we used it in [[#① Full-text search type field]]. The basic syntax of the word segmentation operation is:

 
 

json

copy code

POST _analyze { "analyzer": "{standard}", "text": ["{text}"] }

Among them {standard}is the tokenizer we want to use, generally standard, {text}which is the content we want to analyze. The working principle of the tokenizer is that there is a thesaurus behind it, and sentences will be divided according to the content of the thesaurus. However, the standard tokenizer sometimes cannot achieve what we want, for example, some popular words will not be included. If we want to 法外狂徒张三divide , we can divide it into 法外狂徒and 张三.

 
 

json

copy code

POST _analyze { "analyzer": "standard", "text": ["法外狂徒张三"] }

But what is it actually divided into?

 
 

json

copy code

{ "tokens" : [ { "token" : "法", ... }, { "token" : "外", ... }, { "token" : "狂", ... }, { "token" : "徒", ... }, { "token" : "张", ... }, { "token" : "三", ... } ] }

Obviously not what we want, so we need a way to enter these buzzwords.

3.2 Custom tokenizer

We use the download address of the corresponding version of ik tokenizer ik Site Unreachable

First modify the running memory of the virtual machine to 3G and then modify the ES startup memory to a maximum of 512. You need to remove the ES container first and then create a new container

 
 

shell

copy code

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ -e ES_JAVA_OPTS="-Xms128m -Xmx512m" \ -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ -d elasticsearch:7.4.2

Then start an nginx instance, there are many configuration files in the nginx instance, we need to use these configuration files

 
 

shell

copy code

docker run -p 80:80 --name nginx -d nginx:1.10

If there is no nginx instance, it will give priority to downloading the nginx instance for us, and then start it according to the configuration information.

 
 

shell

copy code

docker run -p 80:80 --name nginx \ -v /mydata/nginx/html:/usr/share/nginx/html \ -v /mydata/nginx/logs:/var/log/nginx \ -v /mydata/nginx/conf/:/etc/nginx \ -d nginx:1.10

When visiting the nginx website, 403forbidden will pop up at this time, because there is currently no page displayed, so we can create some pages under the nginx/html/ directory, and our custom vocabulary will also be placed here. Here we generate a fenci.txt, and write some custom phrases in it to use http://ip/es/fenci.txtaccess. Then we need to modify the configuration information of es

 
 

shell

copy code

/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml

The original configuration information content is

 
 

xml

copy code

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 扩展配置</comment> <!--用户可以在这里配置自己的扩展字典 --> <entry key="ext_dict"></entry> <!--用户可以在这里配置自己的扩展停止词字典--> <entry key="ext_stopwords"></entry> <!--用户可以在这里配置远程扩展字典 --> <!--<entry key="remote_ext_dict">remote</entry>--> <!--用户可以在这里配置远程扩展停止词字典--> <!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>

And what we want to modify is the remote extension dictionary and modify it as follows

 
 

xml

copy code

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> ...此处省略不展示 <!--用户可以在这里配置远程扩展字典 --> <entry key="remote_ext_dict">http://ip/es/fenci.txt</entry> ...此处省略不展示 </properties>

At this point, we can re-segment the word and get the answer we want.

4. Springboot integrates ElasticSearch

Official documentation: Initialization | Java REST Client [7.4] | Elastic

4.1 Integrate ElasticSearch

4.1.1 Create a new project

4.1.2 Introducing dependencies

 
 

xml

copy code

<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.4.2</version> </dependency>

At the same time, it is introduced to solve the problem that some dependent versions are not 7.4.2

 
 

xml

copy code

<parent> <artifactId>spring-boot-starter-parent</artifactId> <groupId>org.springframework.boot</groupId> <version>2.6.13</version> <relativePath></relativePath> </parent> <properties> <elasticsearch.version>7.4.2</elasticsearch.version> </properties>

4.1.3 Initialization

Configure RestHighLevelClient

 
 

java

copy code

@Configuration public class GulimallElasticSearchConfig { @Bean public RestHighLevelClient getRestHighLevelClient() { RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("address", 9200, "http"))); return client; } }

Test whether this bean is registered

 
 

java

copy code

@Test void contextLoads() { System.out.println(client); }

At this time, an error will be reported when starting. This is because we use nacos and need to introduce configuration

 
 

java

copy code

java.lang.IllegalStateException: Failed to load ApplicationContext at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContext(DefaultCacheAwareContextLoaderDelegate.java:98) at ... Caused by: org.springframework.cloud.commons.ConfigDataMissingEnvironmentPostProcessor$ImportException: No spring.config.import set at org.springframework.cloud.commons.ConfigDataMissingEnvironmentPostProcessor.postProcessEnvironment

We can directly exclude the detection of nacos in application.yml

 
 

yml

copy code

spring: cloud: nacos: config: import-check: enabled: false

After the final run, you can see this, which means that the configuration is successful.

 
 

java

copy code

org.elasticsearch.client.RestHighLevelClient@b791a81

4.1.4 Setting request options

GulimallElasticSearchConfig.javaconfigure in

 
 

java

copy code

public static final RequestOptions COMMON_OPTIONS; static { RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder(); COMMON_OPTIONS = builder.build(); }

4.1.5 New Documentation

 
 

java

copy code

@Autowired private RestHighLevelClient client; @Data class User { private String userName; private Integer age; private String gender; } @Test void indexTest() throws IOException { // 创建一个索引请求,users即要创建的或者要使用的索引名字 // 如果没有,则会创建,如果有则直接使用 IndexRequest indexRequest = new IndexRequest("users"); // 设置该条文档的id indexRequest.id("1"); // 准备数据 User user = new User(); user.setUserName("法外狂徒——张三"); user.setAge(18); user.setGender("男"); // 将其转换为json String jsonString = JSONValue.toJSONString(user); indexRequest.source(jsonString, XContentType.JSON); IndexResponse index = client.index(indexRequest, GulimallElasticSearchConfig.COMMON_OPTIONS); System.out.println(index); }

Retrieve before testing

 
 

json

copy code

{ "error" : { "root_cause" : [ { "type" : "index_not_found_exception", "reason" : "no such index [users]", "resource.type" : "index_or_alias", "resource.id" : "users", "index_uuid" : "_na_", "index" : "users" } ], "type" : "index_not_found_exception", "reason" : "no such index [users]", "resource.type" : "index_or_alias", "resource.id" : "users", "index_uuid" : "_na_", "index" : "users" }, "status" : 404 }

It can be found that there is no index at this time. Next, execute the code and you can see that some information will be printed in the console.

 
 

java

copy code

IndexResponse[index=users,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]

This information is similar to that of ES, so let's go to ES to see if this document exists

 
 

json

copy code

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "users", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "gender" : "男", "userName" : "法外狂徒——张三", "age" : 18 } } ] } }

It can be found that there is indeed this document, so far, the ES test is completed.

4.1.6 Retrieving documents

 
 

java

copy code

@Test public void searchData() throws IOException { // 1、创建检索请求 SearchRequest searchRequest = new SearchRequest(); // 2、指定索引 searchRequest.indices("bank"); // 3、指定DSL检索条件 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill")); System.out.println(searchSourceBuilder.toString()); // 4、封装检索条件 searchRequest.source(searchSourceBuilder); // 5、执行检索 SearchResponse search = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS); System.out.println(search.toString()); }

After running, you can get the following information

 
 

java

copy code

// 查询的条件 {"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}} // 返回的数据 {"took":0,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"[email protected]","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"[email protected]","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"[email protected]","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"[email protected]","city":"Movico","state":"MT"}}]}}

These data are the data returned by our previous command, and the content is the same. And RESTful High Level encapsulates a lot of Api for us, we can get every value in it, so I won't demonstrate it here. At this point, the entire project integration ES is over.

When learning the grain mall, I learned the ElasticSearch technology, so I recorded some notes and some supplements during the learning process. Share the notes I made, hoping to help others, if there are deficiencies, I hope everyone can point them out, thank you! !

Guess you like

Origin blog.csdn.net/BASK2312/article/details/131291225