To understand the basics of Elasticsearch, this article is enough

One: Why learn Elasticsearch?
It is a distributed open source search and analysis engine. Click here to learn more . Its bottom layer uses the open source library Lucene, and Elasticsearch has once again simplified the package of Lucene, which can be directly provided for us rest Api (just send a request to it)
2: What is the purpose of Elasticsearch?
Elasticsearch excels in speed and scalability, and it can index multiple types of content, which means it can be used for multiple use cases:

  1. Application search
  2. Site search
  3. Enterprise Search
  4. Log processing and analysis
  5. Infrastructure indicators and container monitoring
  6. Application performance monitoring
  7. Geospatial data analysis and visualization
  8. Safety analysis
  9. Business analysis
    Elasticsearch official Chinese document
    Elasticsearch official English document

Three: basic concepts

  1. Index (index)
    verb, equivalent to insert in mysql (MySQL is called "insert a piece of data", and elasticsearch is called "index a piece of data")
    noun, equivalent to database in Mysql; (To store data in elasticsearch, you must create an Index )
  2. Type
    In Index, multiple types can be defined; similar to Table in MySQL, one data can have multiple indexes.
  3. Document (document) is
    saved under a certain index (Index) (under a certain MySQL database), a data under a certain type (a certain piece of data in a MySQL table) is called Document; and the documents are all JSON format.

Four: Elasticsearch-------inverted index
Elasticsearch can help us find the data we need faster because of its inverted index mechanism.
Inverted index table:

word recording
Red Sea 1,2,3,4,5
action 1,2,3
explore 2,5
especially 4,5
Documentary 4
Agent 5

Word segmentation: split the whole sentence into words to
save the record

  • 1. Operation Red Sea
  • 2. Exploring Operation Red Sea
  • 3. Red Sea Special Operation
  • 4. Red Sea Documentary
  • 5. Special agent Red Sea explores special
    search:
  1. Operation Red Sea Agent?
  2. Operation Red Sea?
  • In MySQL, when we fuzzy search data, we use the keyword like %**%, which makes the query efficiency particularly low (the index cannot be used);

  • When Elasticsearch saves records, such as the above 5 pieces of data, it will maintain an additional inverted index table, such as: storage -1. Operation Red Sea , its operations are as follows:

  1. Word participle: Divide "Red Sea Action" into 2 words "Red Sea" and "Action" (Of course, it can also be split into 4 words "Red", "Sea", "Xing", and "Motion"). The row index table will insert the following records: (here, imagine 1 as the id of each record)
word recording
Red Sea 1
action 1
  1. Next, we save the second record -2. Explore the Red Sea operation . At this time, the inverted index table will continue to segment words, assuming it is divided into 3 words "exploration", "red sea" and "action". At this time, the inverted index table Become like this
word recording
Red Sea 1,2
action 1,2
explore 2
  1. And so on, save -3. The Red Sea special operation , the inverted index table becomes
word recording
Red Sea 1,2,3
action 1,2,3
explore 2
especially 3
  1. Save the 4. Red Sea documentary , the inverted index table becomes
word recording
Red Sea 1,2,3,4
action 1,2,3
explore 2
especially 3
Documentary 4
  1. Save **- 5. Special Agent Red Sea Special Exploration**, the inverted index table becomes immediately
word recording
Red Sea 1,2,3,4,5
action 1,2,3
explore 2,5
especially 4,5
Documentary 4
Agent 5

Now suppose we want to retrieve:

  1. Operation Red Sea Agent?
  2. 红海行动?
    检索步骤:
  1. 将检索内容查分成单词
    保存的记录:
  2. 1.红海行动
  3. 2.探索红海行动
  4. 3.红海特别行动
  5. 4.红海纪录片
  6. 5.特工红海特别探索
    举例:当我们查询** 1) 红海特工行动?这条记录时,它先将其拆分成"红海",“特工”,"行动"三个单词,此时我们在倒排索引表中可以看到,这3个单词在5条记录都存在,但是,我们要的只有一条, Elasticsearch查出的记录会有一个相关性得分:由于3号和5号记录命中了2个单词**,所以最符合的是它俩,但那个更符合呢?
    3号记录中,被我妈拆分成3个单词,有2个就已经命中,命中率2/3; 而5号记录共4个单词,命中2个单词,因为此时2/3 >2/4=1/2,故3号的相关性得分最高,因此3号就是我们要查的最相思的数据(类似于MySQL的模糊查询)!

Guess you like

Origin blog.csdn.net/lq1759336950/article/details/114416740
Recommended