One: Why learn Elasticsearch?
It is a distributed open source search and analysis engine. Click here to learn more . Its bottom layer uses the open source library Lucene, and Elasticsearch has once again simplified the package of Lucene, which can be directly provided for us rest Api (just send a request to it)
2: What is the purpose of Elasticsearch?
Elasticsearch excels in speed and scalability, and it can index multiple types of content, which means it can be used for multiple use cases:
- Application search
- Site search
- Enterprise Search
- Log processing and analysis
- Infrastructure indicators and container monitoring
- Application performance monitoring
- Geospatial data analysis and visualization
- Safety analysis
- Business analysis
Elasticsearch official Chinese document
Elasticsearch official English document
Three: basic concepts
- Index (index)
verb, equivalent to insert in mysql (MySQL is called "insert a piece of data", and elasticsearch is called "index a piece of data")
noun, equivalent to database in Mysql; (To store data in elasticsearch, you must create an Index ) - Type
In Index, multiple types can be defined; similar to Table in MySQL, one data can have multiple indexes. - Document (document) is
saved under a certain index (Index) (under a certain MySQL database), a data under a certain type (a certain piece of data in a MySQL table) is called Document; and the documents are all JSON format.
Four: Elasticsearch-------inverted index
Elasticsearch can help us find the data we need faster because of its inverted index mechanism.
Inverted index table:
word | recording |
---|---|
Red Sea | 1,2,3,4,5 |
action | 1,2,3 |
explore | 2,5 |
especially | 4,5 |
Documentary | 4 |
Agent | 5 |
Word segmentation: split the whole sentence into words to
save the record
- 1. Operation Red Sea
- 2. Exploring Operation Red Sea
- 3. Red Sea Special Operation
- 4. Red Sea Documentary
- 5. Special agent Red Sea explores special
search:
- Operation Red Sea Agent?
- Operation Red Sea?
-
In MySQL, when we fuzzy search data, we use the keyword like %**%, which makes the query efficiency particularly low (the index cannot be used);
-
When Elasticsearch saves records, such as the above 5 pieces of data, it will maintain an additional inverted index table, such as: storage -1. Operation Red Sea , its operations are as follows:
- Word participle: Divide "Red Sea Action" into 2 words "Red Sea" and "Action" (Of course, it can also be split into 4 words "Red", "Sea", "Xing", and "Motion"). The row index table will insert the following records: (here, imagine 1 as the id of each record)
word | recording |
---|---|
Red Sea | 1 |
action | 1 |
- Next, we save the second record -2. Explore the Red Sea operation . At this time, the inverted index table will continue to segment words, assuming it is divided into 3 words "exploration", "red sea" and "action". At this time, the inverted index table Become like this
word | recording |
---|---|
Red Sea | 1,2 |
action | 1,2 |
explore | 2 |
- And so on, save -3. The Red Sea special operation , the inverted index table becomes
word | recording |
---|---|
Red Sea | 1,2,3 |
action | 1,2,3 |
explore | 2 |
especially | 3 |
- Save the 4. Red Sea documentary , the inverted index table becomes
word | recording |
---|---|
Red Sea | 1,2,3,4 |
action | 1,2,3 |
explore | 2 |
especially | 3 |
Documentary | 4 |
- Save **- 5. Special Agent Red Sea Special Exploration**, the inverted index table becomes immediately
word | recording |
---|---|
Red Sea | 1,2,3,4,5 |
action | 1,2,3 |
explore | 2,5 |
especially | 4,5 |
Documentary | 4 |
Agent | 5 |
Now suppose we want to retrieve:
- Operation Red Sea Agent?
- 红海行动?
检索步骤:
- 将检索内容查分成单词
保存的记录: - 1.红海行动
- 2.探索红海行动
- 3.红海特别行动
- 4.红海纪录片
- 5.特工红海特别探索
举例:当我们查询** 1) 红海特工行动?这条记录时,它先将其拆分成"红海",“特工”,"行动"三个单词,此时我们在倒排索引表中可以看到,这3个单词在5条记录都存在,但是,我们要的只有一条, Elasticsearch查出的记录会有一个相关性得分:由于3号和5号记录命中了2个单词**,所以最符合的是它俩,但那个更符合呢?
3号记录中,被我妈拆分成3个单词,有2个就已经命中,命中率2/3; 而5号记录共4个单词,命中2个单词,因为此时2/3 >2/4=1/2,故3号的相关性得分最高,因此3号就是我们要查的最相思的数据(类似于MySQL的模糊查询)!