ElasticSearch study notes three

table of Contents

ES underlying index principle 

IK tokenizer

Customize extension words and stop words in IK


ES underlying index principle 

IK tokenizer

1. Definition : is to split the keywords in a text

I’m Xiao Ming’s classmate

Word segmentation principle: split keywords to remove stop words and stop words

2. Word segmentation provided in ES

    1. The default standard analyzer standard analyzer English: word segmentation Chinese: single-character word segmentation

    2. Simple simple analyzer English: word segmentation  remove the number   Chinese: no word segmentation

3. Test different tokenizers


GET /_analyzer
{
	"analyzer":"simple",
	"text":"redis 非常好用 111"
}
  • The result of standard segmentation is: redis is very easy to use 111
  • The result of simple word segmentation is: redis is very easy to use

4.github based on ES tokenizer IK tokenizer

Note: The use of IK tokenizer and ES version must be strictly consistent

5. What is the difference between ik_max_word and ik_smart?

  • ik_max_word: The text will be split at the finest granularity , such as splitting "I am Xiao Ming's classmate" into "I am Xiao Ming's classmate", "I am", "I am Xiao Ming", "Xiao Ming's classmate" ,"Classmate", will exhaust all possible combinations. I am Xiao Ming’s classmate
  • ik_smart: Will do the most coarse-grained split, such as splitting "I am Xiao Ming's classmate" into "I am Xiao Ming's classmate"
PUT /emp
{
	"mappings":{
		"emp":{
			"properties":{
				"name":{
					"type":"text",
					"analyzer":"ik_max_word"
				},
				"age":{
					"type":"integer"
				},
				"bir":{
					"type":"date"
				},
				"content":{
					"type":"text",
					"analyzer":"ik_max_word"
				},
				"address":{
					"type":"keyword"
				}
			}
		}
	}
}

Customize extension words and stop words in IK

1. Expansion words

Definition: The existing ik tokenizer cannot segment this word into a keyword, but hopes that a certain word becomes a keyword

        ik tokenizer, etc. can be split into keywords, such as some popular online words 

        Configure IK configuration file: The name in the /plugins/ik/config directory under the ES installation directory: IKAnalyzer.cfg.xml

       Modify the configuration file to add the following configuration:

                <!--Users can configure their own extended dictionary here>

                <entry key="ext_dict">ext.dic</entry>

2. Stop words

Definition: The existing ik tokenizer divides a keyword into one word, but for some reason this word cannot appear as a keyword

<entry key="ext_stopwords">stopext.dic</entry>

3. Configure remote extension dictionary

 

EN 中 Query

1. Query String! Query DSL query

    Keyword query -----> calculate score, sort, etc. series

2. Filter Quey filter query efficiency is relatively high

     Filter out the data that meets the conditions --------> Document score will not be calculated, sorted, commonly used Filter automatically commonly used fiter results

     You must use bool expressions to combine the two queries

     Note: When filterQuery and query are used in combination, the statement in fiterQuery is executed first, and then the statement in query is executed

Filtering is suitable for filtering data in a large range , while query is suitable for matching data exactly . In general applications, filter data should be used first , and then query matching data should be used .

Guess you like

Origin blog.csdn.net/weixin_37841366/article/details/109412252