1. Advanced operation of ElasticSearch
1.1 Query operation with query conditions
During the query process, we may need to carry some query conditions in the request path, and hope to query the desired documents with the query conditions. Therefore, next, we will introduce some two types of queries that carry query conditions in ElasticSearch operate. The basic syntax for query operations is
json
copy code
GET /{索引名}/_search(?查询条件)
1.1.1 Carry query conditions in the request path
We want to query all the data and sort them in ascending order according to the value of the account_number field, so we can use the following command
json
copy code
GET /bank/_search?q=*&sort=account_number:asc
Let's analyze the meaning of this command
GET
: Indicates the GET request sent, used for querybank
: which index to query_search
: is the default indexq=*
: indicates that the content of the query is all, q is the abbreviation of querysort=account_number:asc
: specifies which field to sort in ascending/descending order
The result of executing this instruction is: It is a bit long to display the queried data here, so some field values are omitted here, and only some of them are listed
json
copy code
{ ... "max_score" : null, // 最匹配的文档的分数 "hits" : [ // 存放匹配的文档 { "_index" : "bank", // 索引 "_type" : "account", // 类型 "_id" : "0", // 文档的id "_score" : null, // 该条文档的相关性分数 "_source" : { // 这条文档相关的字段和其对应的值 "account_number" : 0, "balance" : 16623, "firstname" : "Bradshaw", "lastname" : "Mckenzie", "age" : 29, "gender" : "F", "address" : "244 Columbus Place", "employer" : "Euron", "email" : "[email protected]", "city" : "Hobucken", "state" : "CO" }, "sort" : [ // 排序值 0 ] }, { ... "_id" : "1", "_score" : null, "_source" : { "account_number" : 1, ... }, "sort" : [ 1 ] }, { ... "_id" : "2", "_score" : null, "_source" : { "account_number" : 2, ... }, "sort" : [ 2 ] }, { ... "_id" : "3", "_score" : null, "_source" : { "account_number" : 3, ... }, "sort" : [ 3 ] }, { ... "_id" : "4", "_score" : null, "_source" : { "account_number" : 4, ... }, "sort" : [ 4 ] }, { ... "_id" : "5", "_score" : null, "_source" : { "account_number" : 5, "... }, "sort" : [ 5 ] }, { ... "_id" : "6", "_score" : null, "_source" : { "account_number" : 6, ... }, "sort" : [ 6 ] }, { ... "_id" : "7", "_score" : null, "_source" : { "account_number" : 7, ... }, "sort" : [ 7 ] }, { ... "_id" : "8", "_score" : null, "_source" : { "account_number" : 8, ... }, "sort" : [ 8 ] }, { ... "_id" : "9", "_score" : null, "_source" : { "account_number" : 9, ... }, "sort" : [ 9 ] } ] } }
This method is certainly possible, but when we have many query conditions, the entire request path will be a bit long, and it is not easy to check. Therefore, ES officially recommends that we use another method, using Query DSL.
1.1.2 Query DSL way
Official website documentation: Query and filter context | Elasticsearch Reference [6.0] | Elastic official website entry documentation: Executing Searches | Elasticsearch Reference [6.0] | Elastic You can read the official documentation, here is just a brief introduction.
json
copy code
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": { "order": "asc" } } ] }
To paraphrase what is written above:
GET /bank/_search
: Indicates that you want tobank
send a query request to China, and the specific query conditions are located in the curly brackets behindquery
: Indicates the specific conditions of the query, which is an object, and all query conditions can be defined herematch_all
: match all, query allsort
: indicate sorting, need to specify the query Which field is to be sorted in which way in the output documentaccount_number
: it means thataccount_number
the field is operated in a certain wayorder:asc
: toaccount_number
sort, the sorting method is ascending
Therefore, this writing method expresses the same meaning as the previous writing, but it is written differently. It should be noted that the curly braces of the query conditions should be placed on a new line, otherwise an error will be reported, even if the following writing method is not acceptable, because the meanings expressed are the same, and the results must be the same, so the query results will not be posted here up. Of course, the above-mentioned writing method can also be abbreviated, and the abbreviated writing method is as follows:
json
copy code
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" }] }
Moreover, we can notice that in the above three query operations, 10 pieces of data are finally returned, and at the top of the query results, there are the following parameter values, among which represent the number of currently detected data pieces, a total value
of It is 1000 records, and only the first 10 records are displayed here. This is because ES will automatically perform paging operations for us, and the default is to display 10 records of the first page. So we can only see 10 pieces of data, of course we can also customize the related pagination
json
copy code
"total" : { "value" : 1000, "relation" : "eq" },
1.2 Return only part of the field value
Earlier we used Query DSL to query all field values, but sometimes we don’t necessarily need all field values, only some field values, and ES also provides corresponding APIs. firstname
Assuming that we only need to find out the two field values of and in the data table now lastname
, then we can use it _source
. In this parameter, just write the name of the field you want to query. All fields are not written to be queried by default. The specific writing method is as follows:
json
copy code
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": { "order": "desc" } } ], "_source": ["firstname","lastname"] }
At this time, the results of the query are as follows: Similarly, some query results are omitted
json
copy code
{ ... "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "0", "_score" : null, "_source" : { "firstname" : "Bradshaw", "lastname" : "Mckenzie" }, "sort" : [ 0 ] }, ...省略9条数据 ] } }
_source
You can see that there are only two field values left at this time .
1.3 match usage
match will retrieve matching documents for the specified field names and field values. There are two types of search using match, one is full-text search and the other is precise search. How to judge whether it is a full-text search or a precise search is not explained here, but _mapping
will be introduced later. For full-text search:
json
copy code
GET /bank/_search { "query": { "match": { "address": "Avenue" } } }
address
The data returned at this time is: Only part of the data is displayed here, and all the documents contained in are displayed here Avenue
. At this time, there are a total of 214 documents address
included in the document Avenue
.
json
copy code
"total" : { "value" : 214, "relation" : "eq" }, "max_score" : 1.5400246, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "25", "_score" : 1.5400246, "_source" : { ... "address" : "171 Putnam Avenue", ... } }, { ... "_score" : 1.5400246, "_source" : { ... "address" : "759 Newkirk Avenue", ... } }, ...
In case of exact search:
json
copy code
GET /bank/_search { "query": { "match": { "age": "20" } } }
The data returned at this time is:
json
copy code
"total" : { "value" : 44, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { ... "_score" : 1.0, "_source" : { ... "age" : 20, ... } }, { ... "_score" : 1.0, "_source" : { ... "age" : 20, ... } },
From the above two search methods, we can see that for full-text search, they will be sorted according to the relevance score from large to small, and the more matching the score, the higher the priority will be displayed. For exact retrieval, as long as it is a matching document, its relevance score is 1.0, which is fixed.
1.4 match_phrase phrase matching
Phrase matching means that only when the same phrase appears and the sequence is consistent, it will match, otherwise it will not match, which is different from the fuzzy matching of full-text search. The following example is the document address
contained in the query attributeNewkirk Avenue
json
copy code
GET /bank/_search { "query": { "match_phrase": { "address": "Newkirk Avenue" } } }
The data returned at this time is:
json
copy code
"max_score" : 7.5308537, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "102", "_score" : 7.5308537, "_source" : { ... "address" : "759 Newkirk Avenue", ... } } ]
At this time, only one document matches. It is obvious that the number of documents at this time is much smaller, which is different from the number of documents found when using match before, indicating that this is different from the fuzzy matching of full-text search.
1.5 Multi-field matching
In the query process, we don't just want to match one field, but documents that match multiple fields together. So you need to use another parameter, multi_match
. multi_match
Multiple fields are supported for matching.
json
copy code
GET /bank/_search { "query": { "multi_match": { "query": "Newkirk Choctaw", "fields": ["address","city"] } } }
Explain multi_match
the two parameters used
query
: The value to be matched : the field name that needs to be matchedfields
with the value set inquery
According to the match syntax, it is address:Newkirk
, address:Choctaw
, city:Newkirk
and city:Choctaw
. The data returned at this time is:
json
copy code
"total" : { "value" : 3, "relation" : "eq" }, "max_score" : 6.505949, "hits" : [ { ... "_score" : 6.505949, "_source" : { ... "address" : "759 Newkirk Avenue", ... "city" : "Choctaw", "state" : "NJ" } }, { ... "_score" : 6.505949, "_source" : { ... "address" : "803 Glenmore Avenue", ... "city" : "Newkirk", "state" : "KS" } }, { ... "_score" : 5.9908285, "_source" : { ... "address" : "865 Newkirk Placez", ... "city" : "Ada", "state" : "ID" } } ]
At this time, there are only three pieces of data that match, and these three pieces of data either contain one or more words in address, or contain one or more words in city, or contain both. A bit similar to MySQL's or query.
1.6 bool compound query
The bool compound query is a query structure that combines multiple query conditions. There are four major structures, namely: must
, must_not
, should
, filter
.
must
: Required, that is, the condition that must be included in the document, providing a relevance scoremust_not
: must not, that is, the condition that must not exist in the document, does not provide a relevance scoreshould
: should, i.e. conditions that may or may not be present in the document, provide a relevance score when matchedfilter
: filter, that is, the referent clause (query) must appear in the matching document, and no relevance score is provided. Query an address must contain Avenue, the city does not contain Choctaw, and the gender priority is F
json
copy code
GET /bank/_search { "query": { "bool": { "must": [ { "match": { "address": "Avenue" } } ], "must_not": [ { "match": { "city": "Choctaw" } } ], "should": [ { "match": { "gender": "F" } } ] } } }
The data returned at this time 必须包含Avenue,所在城市不包含Choctaw且性别优先为F
will not be posted here, and the space will be a bit long. Find documents whose age is between 10-20
json
copy code
GET /bank/_search { "query": { "bool": { "filter": { "range": { "age": { "gte": 10, "lte": 20 } } } } } }
1.7 term
term can do a precise search, it will take the entire value to search, the word will not be analyzed during the search, and the case and content must be exactly the same to match. term can search fields of full-text search type and precise search type. Here are two examples to demonstrate
1.7.1 Fields of Full Text Search Type
For fields of the full-text search type, if you use term to search
json
copy code
GET /bank/_search { "query": { "term": { "lastname": { "value": "Terry" } } } }
At this point, the returned data is:
json
copy code
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 0, "relation" : "eq" }, "max_score" : null, "hits" : [ ] } }
It can be found that no documents are found at this time, because when we insert the document, we will use ES's built-in word analyzer to split the word, and when splitting it will lowercase it.
json
copy code
GET /bank/_analyze { "analyzer": "standard", "text": ["Terry"] }
Here is a brief explanation, analyzer
which means a word analyzer, and the standard used here text
is the word to be analyzed. The result of Terry
analyzing this word is as follows:
json
copy code
{ "tokens" : [ { "token" : "terry", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 0 } ] }
It can be seen that when word analysis is performed, it will be analyzed as stored terry
in the inverted index table when it is finally stored terry
, and term requires that the word case and content are exactly the same to match, so here Match failed.
1.7.2 Fields for Exact Search
For fields that are accurately retrieved, using term for retrieval is no different from match
json
copy code
GET /bank/_search { "query": { "term": { "age": { "value": "20" } } } }
The data found at this time are all age 20, so the data will not be listed here.
1.8 keyword
keyword
It is an attribute that supports precise retrieval. For some full-text search fields, we can add a sub-property to this field keyword
. When we want to perform precise retrieval, we can get this field keyword
and use keyword to Perform a precise search. Next, let's experience the difference between using keyword
and not using in the same situation keyword
. Let me explain in advance that address
it is a full-text search field, and there is also a sub-attribute inside it keyword
. We can use it _mapping
to view this field, and we will find out this The case of indexing all fields
json
copy code
GET /bank/_mapping
We only look at address
the field part
json
copy code
"address" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } },
As you can see, in keyword
this sub-property
1.8.1 Do not use keywords
json
copy code
GET /bank/_search { "query": { "match": { "address": "Gunnison" } } }
The data returned at this time is:
json
copy code
{ "_index" : "bank", "_type" : "account", "_id" : "157", "_score" : 6.501515, "_source" : { "account_number" : 157, "balance" : 39868, "firstname" : "Claudia", "lastname" : "Terry", "age" : 20, "gender" : "F", "address" : "132 Gunnison Court", "employer" : "Lumbrex", "email" : "[email protected]", "city" : "Castleton", "state" : "MD" } }
It can be seen that a piece of data can be found
1.8.2 Using keywords
json
copy code
GET /bank/_search { "query": { "match": { "address.keyword": "Gunnison" } } }
The data returned at this time is:
json
copy code
"hits" : [ ]
At this time, no data is found at all, because at this time, the entire Gunnison will be used for precise retrieval. As mentioned earlier, for precise retrieval, the value of the current field needs to be exactly equal to this value to match successfully. However, there is no such document here, so the retrieval fails.
1.9 Aggregation analysis
Aggregations | Elasticsearch Guide [7.17] | Elastic aggregation analysis means that we want to do some processing on the results of the previous retrieval when searching, such as counting the average value, how many people are in each age group, etc., we can Do it using aggregate analysis. Next, we will directly use three cases to introduce you to aggregation analysis
1.9.1 Search the age distribution and average age of all people whose address contains mill
The DSL we use is as follows:
json
copy code
GET /bank/_search { "query": { "match": { "address": "mill" } }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 10 } }, "ageAvg": { "avg": { "field": "age" } } } }
- The query is to first find out all the people whose address contains mill
- The latter aggs is to aggregate and analyze the results detected by the previous query
- ageAgg is the name of the current aggregation, which can be arbitrary
- terms is to divide the results of the previous query in a certain way. Here, term is selected, and the corresponding field attribute is age, so the query results are searched and divided by terms according to age
- The following ageAvg is to calculate the average age of the results found in the previous query according to the age. Two aggregations are used here, one is and the other is.
ageAgg
TheseageAvg
two aggregations are two names we customized, and the specifics need to be carried out Which aggregation operation can be written below. IfageAgg
aggregation based on age is used,ageAvg
the average value is calculated according to age. The result returned here is:
json
copy code
{ "hits" : { "max_score" : 5.4032025, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "970", "_score" : 5.4032025, "_source" : { ... "age" : 28, ... } }, { "_index" : "bank", "_type" : "account", "_id" : "136", "_score" : 5.4032025, "_source" : { ... "age" : 38, ... } }, { "_index" : "bank", "_type" : "account", "_id" : "345", "_score" : 5.4032025, "_source" : { ... "age" : 38, ... } }, { "_index" : "bank", "_type" : "account", "_id" : "472", "_score" : 5.4032025, "_source" : { ... "age" : 32, ... } } ] }, "aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 38, "doc_count" : 2 }, { "key" : 28, "doc_count" : 1 }, { "key" : 32, "doc_count" : 1 } ] }, "ageAvg" : { "value" : 34.0 } } }
aggregations
Part of it is the result obtained after we use the aggregation analysis, which will be divided according to each aggregation.
1.9.2 Aggregate by age, and find the average salary of these people in these age groups
The query and ageAgg are the same as before, and our balanceAvg, here is the average salary of the required age group, so it is aggregated based on age aggregation, so we need to aggregate again inside ageAgg. This is not the same as ageAgg and ageAvg in the previous case.
json
copy code
GET /bank/_search { "query": { "match_all": {} }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 10 }, "aggs": { "balanceAvg": { "avg": { "field": "balance" } } } } } }
The data returned at this time is as follows:
json
copy code
"aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 463, "buckets" : [ { // 这两个是由ageAgg聚合分析出来的结果 "key" : 31, "doc_count" : 61, // 这个则是由balanceAvg查出来的结果 "balanceAvg" : { "value" : 28312.918032786885 } }
As you can see, for aggregation, we can be parallel or nested.
1.9.3 Find all age distributions, and the average salary of M and the average salary of F in these age groups and the overall average salary of this age group
json
copy code
GET /bank/_search { "query": { "match_all": {} }, "aggs": { // 先按照年龄聚合 "ageAgg": { "terms": { "field": "age", "size": 10 }, // 再将年龄聚合后的结果再次进行聚合 "aggs": { "gender": { "terms": { "field": "gender.keyword", "size": 10 }, "aggs": { "balanceAvg": { "avg": { "field": "balance" } } } }, "totalBalanceAvg": { "avg": { "field": "balance" } } } } } }
It can be seen that the writing method this time will be more complicated than the previous two, mainly because the multi-level nesting of aggregation is used. To sum up, the above are actually three usages of aggregation analysis, juxtaposition, nesting, and nesting within nesting.
2. Mapping of ElasticSearch
Official document: Mapping | Elasticsearch Guide [7.4] | Elastic mapping refers to defining how documents and the fields they contain are stored and indexed.
2.1 Query Mapping
Earlier we used mapping when using keywords, that is
json
copy code
GET /bank/_mapping
This is also the most basic usage, to get the mapping rules of all fields of an index.
2.1.1 New mapping
New mapping refers to creating a new index, adding some fields to this index, and setting mapping rules for these fields. Its basic syntax is:
json
copy code
PUT /{index_name} { "mappings": { "properties": { "{fieldname}": { "type": "{type}" } } } }
{index_name}
Fill in our own index name in , fill {filedname}
in the field name we want to declare in , and {type}
fill in the type of the corresponding field in {}
. There are three types of fields:
- Basic types: text , keyword , date , long , double , boolean or ip
- Types that support the hierarchical nature of JSON: object or nested
- Special types: geo_point , geo_shape , or completion We can customize the specific type. Next, give an example:
json
copy code
PUT /my_index { "mappings": { "properties": { "age": { "type": "integer" }, "email": { "type": "keyword" }, "name": { "type": "text" } } } }
At this point, we can see the mapping rules of the currently created new index by querying the mapping
json
copy code
{ "my_index" : { "mappings" : { "properties" : { "age" : { "type" : "integer" }, "email" : { "type" : "keyword" }, "name" : { "type" : "text" } } } } }
2.3 Add a new attribute to the map
2.3.1 Incorrect spelling
If we want to add a new attribute to an existing mapping, we may first think of adding fields to the original new mapping and re-executing it as when adding a document. If we want to add a employee_id
field, then it will be the following command:
json
copy code
PUT /my_index { "mappings": { "properties": { "age": { "type": "integer" }, "email": { "type": "keyword" }, "name": { "type": "text" }, "emplyee_id": { "type": "long" } } } }
The data returned after execution is as follows:
json
copy code
{ "error": { "root_cause": [ { "type": "resource_already_exists_exception", "reason": "index [my_index/e2ILrRUVQkmL0SzQxZp__g] already exists", "index_uuid": "e2ILrRUVQkmL0SzQxZp__g", "index": "my_index" } ], "type": "resource_already_exists_exception", "reason": "index [my_index/e2ILrRUVQkmL0SzQxZp__g] already exists", "index_uuid": "e2ILrRUVQkmL0SzQxZp__g", "index": "my_index" }, "status": 400 }
An error occurred. The reason for the error is that the current index already exists, so it is not possible. That is to say, every time an index is created, it will first judge whether the index is currently included, if not, the creation is successful, and if it is, the creation fails. Therefore, we cannot use this method.
2.3.2 Correct spelling
json
copy code
PUT /my_index/_mapping { "properties": { "employee_id": { "type": "keyword", "index": false } } }
index
The field defaults to true, which means that the index is turned on. Turning on the index means that it can be retrieved. As long as the index of a certain attribute is turned on, then we can find some documents through this field during the search process. If the index is turned off, Then it is equivalent to the existence of this attribute, but the corresponding document record cannot be found through this attribute, that is, the redundant attribute. Requery mapping rules
json
copy code
{ "my_index" : { "mappings" : { "properties" : { "age" : { "type" : "integer" }, "email" : { "type" : "keyword" }, "employee_id" : { "type" : "keyword", "index" : false }, "name" : { "type" : "text" } } } } }
As you can see, the addition was successful at this time.
2.4 Update mapping and data migration
In the actual development process, we may feel that the mapping rules of a certain field are inappropriate and want to modify them, but this is not possible. Because the attributes of the current mapping rules are associated with a lot of data, if we modify them directly, those indexes created before may become invalid, and the previous data will not be automatically modified together because we modify the mapping rules. So all we can do is to re-create an index and copy the entire data over.
2.4.1 Get all mapping rules of the current index
json
copy code
GET /bank/_mapping
2.4.2 Copy and modify the original mapping rules, and put them into the new index
json
copy code
PUT /newbank { "mappings": { "properties" : { "account_number" : { "type" : "long" }, "address" : { "type" : "text" }, "age" : { "type" : "integer" }, "balance" : { "type" : "long" }, "city" : { "type" : "keyword" }, "email" : { "type" : "keyword" }, "employer" : { "type" : "keyword" }, "firstname" : { "type" : "text" }, "gender" : { "type" : "keyword" }, "lastname" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "state" : { "type" : "keyword" } } } }
2.4.3 Migrating old index data to new index
json
copy code
// 发送重新索引请求 POST _reindex { "source": { "index": "bank", "type": "account" }, "dest": { "index": "newbank" } }
In this way, the modification of the mapping rules and the migration of data are completed. We can query the document of this new index, and we can find that the content inside is the same as the previous one.
3. Word segmentation operation of ElasticSearch
3.1 Introduction to Word Segmentation of ES
The word segmentation operation is to analyze a sentence and split it into several phrases or words. We also used this before, that is, we used it in [[#① Full-text search type field]]. The basic syntax of the word segmentation operation is:
json
copy code
POST _analyze { "analyzer": "{standard}", "text": ["{text}"] }
Among them {standard}
is the tokenizer we want to use, generally standard, {text}
which is the content we want to analyze. The working principle of the tokenizer is that there is a thesaurus behind it, and sentences will be divided according to the content of the thesaurus. However, the standard tokenizer sometimes cannot achieve what we want, for example, some popular words will not be included. If we want to 法外狂徒张三
divide , we can divide it into 法外狂徒
and 张三
.
json
copy code
POST _analyze { "analyzer": "standard", "text": ["法外狂徒张三"] }
But what is it actually divided into?
json
copy code
{ "tokens" : [ { "token" : "法", ... }, { "token" : "外", ... }, { "token" : "狂", ... }, { "token" : "徒", ... }, { "token" : "张", ... }, { "token" : "三", ... } ] }
Obviously not what we want, so we need a way to enter these buzzwords.
3.2 Custom tokenizer
We use the download address of the corresponding version of ik tokenizer ik Site Unreachable
First modify the running memory of the virtual machine to 3G and then modify the ES startup memory to a maximum of 512. You need to remove the ES container first and then create a new container
shell
copy code
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ -e ES_JAVA_OPTS="-Xms128m -Xmx512m" \ -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ -d elasticsearch:7.4.2
Then start an nginx instance, there are many configuration files in the nginx instance, we need to use these configuration files
shell
copy code
docker run -p 80:80 --name nginx -d nginx:1.10
If there is no nginx instance, it will give priority to downloading the nginx instance for us, and then start it according to the configuration information.
shell
copy code
docker run -p 80:80 --name nginx \ -v /mydata/nginx/html:/usr/share/nginx/html \ -v /mydata/nginx/logs:/var/log/nginx \ -v /mydata/nginx/conf/:/etc/nginx \ -d nginx:1.10
When visiting the nginx website, 403forbidden will pop up at this time, because there is currently no page displayed, so we can create some pages under the nginx/html/ directory, and our custom vocabulary will also be placed here. Here we generate a fenci.txt, and write some custom phrases in it to use http://ip/es/fenci.txt
access. Then we need to modify the configuration information of es
shell
copy code
/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
The original configuration information content is
xml
copy code
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 扩展配置</comment> <!--用户可以在这里配置自己的扩展字典 --> <entry key="ext_dict"></entry> <!--用户可以在这里配置自己的扩展停止词字典--> <entry key="ext_stopwords"></entry> <!--用户可以在这里配置远程扩展字典 --> <!--<entry key="remote_ext_dict">remote</entry>--> <!--用户可以在这里配置远程扩展停止词字典--> <!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>
And what we want to modify is the remote extension dictionary and modify it as follows
xml
copy code
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> ...此处省略不展示 <!--用户可以在这里配置远程扩展字典 --> <entry key="remote_ext_dict">http://ip/es/fenci.txt</entry> ...此处省略不展示 </properties>
At this point, we can re-segment the word and get the answer we want.
4. Springboot integrates ElasticSearch
Official documentation: Initialization | Java REST Client [7.4] | Elastic
4.1 Integrate ElasticSearch
4.1.1 Create a new project
4.1.2 Introducing dependencies
xml
copy code
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.4.2</version> </dependency>
At the same time, it is introduced to solve the problem that some dependent versions are not 7.4.2
xml
copy code
<parent> <artifactId>spring-boot-starter-parent</artifactId> <groupId>org.springframework.boot</groupId> <version>2.6.13</version> <relativePath></relativePath> </parent> <properties> <elasticsearch.version>7.4.2</elasticsearch.version> </properties>
4.1.3 Initialization
Configure RestHighLevelClient
java
copy code
@Configuration public class GulimallElasticSearchConfig { @Bean public RestHighLevelClient getRestHighLevelClient() { RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("address", 9200, "http"))); return client; } }
Test whether this bean is registered
java
copy code
@Test void contextLoads() { System.out.println(client); }
At this time, an error will be reported when starting. This is because we use nacos and need to introduce configuration
java
copy code
java.lang.IllegalStateException: Failed to load ApplicationContext at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContext(DefaultCacheAwareContextLoaderDelegate.java:98) at ... Caused by: org.springframework.cloud.commons.ConfigDataMissingEnvironmentPostProcessor$ImportException: No spring.config.import set at org.springframework.cloud.commons.ConfigDataMissingEnvironmentPostProcessor.postProcessEnvironment
We can directly exclude the detection of nacos in application.yml
yml
copy code
spring: cloud: nacos: config: import-check: enabled: false
After the final run, you can see this, which means that the configuration is successful.
java
copy code
org.elasticsearch.client.RestHighLevelClient@b791a81
4.1.4 Setting request options
GulimallElasticSearchConfig.java
configure in
java
copy code
public static final RequestOptions COMMON_OPTIONS; static { RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder(); COMMON_OPTIONS = builder.build(); }
4.1.5 New Documentation
java
copy code
@Autowired private RestHighLevelClient client; @Data class User { private String userName; private Integer age; private String gender; } @Test void indexTest() throws IOException { // 创建一个索引请求,users即要创建的或者要使用的索引名字 // 如果没有,则会创建,如果有则直接使用 IndexRequest indexRequest = new IndexRequest("users"); // 设置该条文档的id indexRequest.id("1"); // 准备数据 User user = new User(); user.setUserName("法外狂徒——张三"); user.setAge(18); user.setGender("男"); // 将其转换为json String jsonString = JSONValue.toJSONString(user); indexRequest.source(jsonString, XContentType.JSON); IndexResponse index = client.index(indexRequest, GulimallElasticSearchConfig.COMMON_OPTIONS); System.out.println(index); }
Retrieve before testing
json
copy code
{ "error" : { "root_cause" : [ { "type" : "index_not_found_exception", "reason" : "no such index [users]", "resource.type" : "index_or_alias", "resource.id" : "users", "index_uuid" : "_na_", "index" : "users" } ], "type" : "index_not_found_exception", "reason" : "no such index [users]", "resource.type" : "index_or_alias", "resource.id" : "users", "index_uuid" : "_na_", "index" : "users" }, "status" : 404 }
It can be found that there is no index at this time. Next, execute the code and you can see that some information will be printed in the console.
java
copy code
IndexResponse[index=users,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
This information is similar to that of ES, so let's go to ES to see if this document exists
json
copy code
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "users", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "gender" : "男", "userName" : "法外狂徒——张三", "age" : 18 } } ] } }
It can be found that there is indeed this document, so far, the ES test is completed.
4.1.6 Retrieving documents
java
copy code
@Test public void searchData() throws IOException { // 1、创建检索请求 SearchRequest searchRequest = new SearchRequest(); // 2、指定索引 searchRequest.indices("bank"); // 3、指定DSL检索条件 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill")); System.out.println(searchSourceBuilder.toString()); // 4、封装检索条件 searchRequest.source(searchSourceBuilder); // 5、执行检索 SearchResponse search = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS); System.out.println(search.toString()); }
After running, you can get the following information
java
copy code
// 查询的条件 {"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}} // 返回的数据 {"took":0,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"account","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"[email protected]","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"account","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"[email protected]","city":"Urie","state":"IL"}},{"_index":"bank","_type":"account","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"[email protected]","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"account","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"[email protected]","city":"Movico","state":"MT"}}]}}
These data are the data returned by our previous command, and the content is the same. And RESTful High Level encapsulates a lot of Api for us, we can get every value in it, so I won't demonstrate it here. At this point, the entire project integration ES is over.
When learning the grain mall, I learned the ElasticSearch technology, so I recorded some notes and some supplements during the learning process. Share the notes I made, hoping to help others, if there are deficiencies, I hope everyone can point them out, thank you! !