Microservices Initial Distributed Search Engine ElasticSearch

⛄Introduction

Elasticsearch is a very powerful open source search engine with many powerful functions that can help us quickly find what we need from massive amounts of data

1. What is Elastic Search?

Elasticsearch is a distributed, RESTful search and data analytics engine that addresses a wide variety of emerging use cases. At the heart of the Elastic Stack, Elasticsearch centralizes your data, allowing you to search blazingly, fine-tune relevance, perform powerful analytics, and scale with ease.

ElasticSearch official website

insert image description here

Elasticsearch is a very powerful open source search engine with many powerful functions that can help us quickly find what we need from massive amounts of data

The role of ES

For example: Baidu search questions, Jingdong, Taobao search products.

ELK technology stack

Elasticsearch combines kibana, Logstash, Beats, which is the elastic stack ( ELK ). It is widely used in log data analysis, real-time monitoring and other fields:

Elasticsearch is the core of the elastic stack, responsible for storing, searching, and analyzing data.

insert image description here

ElasticSearch 和 Lucene

The bottom layer of elasticsearch is implemented based on lucene .

Lucene is a Java language search engine class library, which is the top project of Apache company and was developed by Doug Cutting in 1999. Official website address: lucene.apache.org/ .

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-3KBJwjI8-1681570658781)(null)]

The development history of elasticsearch :

  • In 2004, Shay Banon developed Compass based on Lucene
  • In 2010, Shay Banon rewrote Compass, named Elasticsearch.

Compared with Lucene, ElasticSearch has the following advantages

  • Support distributed, horizontal expansion
  • Provide ResultFul interface, which can be called by any language

Why not other search technologies?

As of 2023, well-known search engines are as follows:

insert image description here

Although in the early days, Apache Solr was the most important search engine technology, but with the development of elasticsearch has gradually surpassed Solr and took the lead:

insert image description here

ElasticSearch Distributed search engine The world's leading open source framework No. 1!

2. Elastic Search Inverted Index

⛅ forward index

What is a forward index? For example, create an index for the id in the table (tb_goods commodity table):

insert image description here

If you query based on id, then go directly to the index, and the query speed is very fast.

But if you do fuzzy query based on the title, you can only scan the data line by line. The process is as follows:

  • User search data, the condition is that the title matches"%手机%"

  • Get data row by row, such as data with id 1

  • Determine whether the title in the data meets the user's search criteria

  • If it matches, it will be put into the result set, if not, it will be discarded. back to step 1

Progressive scan, that is, full table scan, as the amount of data increases, its query efficiency will become lower and lower. When the amount of data reaches millions or even tens of millions, it is a disaster.

⚡Inverted index

The concept of an inverted index is based on a forward index like MySQL.

There are two very important concepts in inverted index:

  • Document ( Document): The data used for searching, each piece of data is a document. For example, a web page, a product information
  • Entry ( Term): For document data or user search data, use a certain algorithm to segment words, and the words with meaning obtained are entries. For example: I am Chinese, it can be divided into several entries: I, Yes, Chinese, China, Chinese

Creating an inverted index is a special treatment for a forward index, and the process is as follows:

  • Use the algorithm to segment the data of each document to get each entry
  • Create a table, each row of data includes information such as the entry, the document id where the entry is located, and the location
  • Because of the uniqueness of the entry, you can create an index for the entry, such as a hash table structure index

insert image description here

The search process of the inverted index is as follows (take the search for "Huawei mobile phone" as an example):

1) The user enters criteria "华为手机"to search.

2) Segment the content input by the user to get the entries: 华为, 手机.

3) Take the entry and search in the inverted index, and you can get the document ids that contain the entry: 1, 2, and 3.

4) Take the document id to find the specific document in the forward index.

insert image description here

Although the inverted index needs to be queried first, and then the inverted index is queried, but both entries and document ids have been indexed, and the query speed is very fast! No full table scan is required.

⛄Forward and reverse

So why is one called a forward index and the other an inverted index?

  • Forward indexing is the most traditional way of indexing by id. However, when querying based on terms, you must first obtain each document one by one, and then determine whether the document contains the required term, which is the process of finding terms based on the document .

  • The inverted index is the opposite. It first finds the entry that the user wants to search, obtains the id of the document that protects the entry according to the entry, and then obtains the document according to the id. It is the process of finding documents based on entries .

Is it just the other way around?

So what are the pros and cons of both approaches?

forward index :

  • advantage:
    • Indexes can be created for multiple fields
    • Searching and sorting based on index fields is very fast
  • shortcoming:
    • When searching based on non-indexed fields or some terms in indexed fields, only full table scans can be performed.

Inverted index :

  • advantage:
    • When searching according to terms and fuzzy search, the speed is very fast
  • shortcoming:
    • Indexes can only be created for terms, not fields
    • Can't sort by field

3. Some concepts of ES

There are many unique concepts in elasticsearch, which are slightly different from mysql, but there are also similarities.

⛅ Documents and Fields

Elasticsearch is document- oriented storage, which can be a piece of commodity data or an order information in the database. Document data will be serialized into json format and stored in elasticsearch:

insert image description here

Json documents often contain many fields (Field) , which are similar to columns in a database.

⚡ Indexing and Mapping

Index (Index) is a collection of documents of the same type.

For example:

  • All user documents can be organized together, called the user index;
  • The documents of all commodities can be organized together, called commodity index;
  • The documents of all orders can be organized together, called the index of orders;

insert image description here

Therefore, we can think of indexes as tables in a database.

The table of the database will have constraint information, which is used to define the structure of the table, the name and type of the field and other information. Therefore, there is mapping in the index library , which is the field constraint information of the documents in the index, similar to the structural constraints of the table.

4. MySQL and Elasticsearch

Let's compare the concepts of MySQL and Elasticsearch in a unified way:

MySQL Elasticsearch illustrate
Table Index An index is a collection of documents, similar to a table in a database
Row Document Document (Document), is a piece of data, similar to the row (Row) in the database, the document is in JSON format
Column Field Field (Field) is a field in a JSON document, similar to a column (Column) in a database
Schema Mapping Mappings are constraints on documents in an index, such as field type constraints. Database-like table structure (Schema)
SQL DSL DSL is a JSON-style request statement provided by elasticsearch, which is used to operate elasticsearch and implement CRUD

Does it mean that we no longer need mysql after learning elasticsearch?

Not really, both have their own pros and cons:

  • Mysql: Good at transaction type operations, can ensure data security and consistency

  • Elasticsearch: Good at searching, analyzing, and computing massive amounts of data

Therefore, in enterprises, the two are often used in combination:

  • For write operations with high security requirements, use mysql to implement
  • For search requirements that require high query performance, use elasticsearch to achieve
  • The two are based on a certain method to achieve data synchronization and ensure consistency

insert image description here

⛵Summary

The above is [ Bug Terminator ] a brief introduction to the microservice Spring Boot integrating Redis to achieve UV data statistics . The UV data statistics function is very commonly used. It is a good highlight in the project, and the statistics function is also important in major systems. After the sign-in is completed, count the continuous sign-in records of this month to give rewards, which can greatly increase the user's activity on the system. HyperLogLog can be combined with BitMap, so that it can efficiently conduct in-depth analysis on the website! 技术改变世界!!!


From:
Elastic Search, the initial distributed search engine for microservices - Nuggets

Guess you like

Origin blog.csdn.net/qq_34626094/article/details/130176856