【Elasticsearch】初识elasticsearch

Table of contents

Getting to know elasticsearch

1.1. Understanding ES

1.1.1. The role of elasticsearch

1.1.2. ELK technology stack

1.1.3.elasticsearch和lucene

1.1.4. Why not other search techniques?

1.1.5. Summary

1.2. Inverted index

1.2.1. Forward index

1.2.2. Inverted index

1.2.3. Forward and reverse

Some concepts of 1.3.es

1.3.1. Documents and fields

1.3.2. Indexes and Mappings

1.3.3.mysql与elasticsearch

Getting to know elasticsearch

1.1. Understanding ES

1.1.1. The role of elasticsearch

Elasticsearch is a very powerful open source search engine with many powerful functions that can help us quickly find what we need from massive amounts of data

For example:

  • Search code on GitHub

  • Search for products on e-commerce sites

  • Search for answers in Baidu

  • Search for nearby cars on the taxi app

1.1.2. ELK technology stack

elasticsearch combines kibana, Logstash, Beats, which is elastic stack (ELK). It is widely used in log data analysis, real-time monitoring and other fields:

Elasticsearch is the core of the elastic stack, responsible for storing, searching, and analyzing data.

1.1.3.elasticsearch和lucene

The bottom layer of elasticsearch is implemented based on lucene .

Lucene is a Java language search engine class library, which is the top project of Apache company and was developed by Doug Cutting in 1999. Official website address: Apache Lucene - Welcome to Apache Lucene .

The development history of elasticsearch :

  • In 2004, Shay Banon developed Compass based on Lucene

  • In 2010, Shay Banon rewrote Compass, named Elasticsearch.

1.1.4. Why not other search techniques?

Currently well-known search engine technology rankings:

Although in the early days, Apache Solr was the most important search engine technology, but with the development of elasticsearch has gradually surpassed Solr and took the lead:

1.1.5. Summary

What is elasticsearch?

  • An open source distributed search engine that can be used to implement functions such as search, log statistics, analysis, and system monitoring

What is elastic stack (ELK)?

  • A technology stack with elasticsearch as the core, including beats, Logstash, kibana, elasticsearch

What is Lucene?

  • It is Apache's open source search engine class library, which provides the core API of the search engine

1.2. Inverted index

The concept of an inverted index is based on a forward index like MySQL.

1.2.1. Forward index

So what is a forward index? For example, create an index for the id in the following table (tb_goods):

If you query based on id, then go directly to the index, and the query speed is very fast.

But if you do fuzzy query based on the title, you can only scan the data line by line. The process is as follows:

1) The user searches for data, the condition is that the title matches"%手机%"

2) Get data row by row, such as data with id 1

3) Determine whether the title in the data meets the user's search criteria

4) If it matches, put it into the result set, and if it doesn't match, discard it. back to step 1

Progressive scan, that is, full table scan, as the amount of data increases, its query efficiency will become lower and lower. When the volume of data reaches millions, it is a disaster.

1.2.2. Inverted index

There are two very important concepts in inverted index:

  • Document ( Document): The data used for searching, each piece of data is a document. For example, a web page, a product information

  • Entry ( Term): For document data or user search data, use a certain algorithm to segment words, and the words with meaning obtained are entries. For example: I am Chinese, it can be divided into several entries: I, Yes, Chinese, China, Chinese

Creating an inverted index is a special treatment for a forward index, and the process is as follows:

  • Use the algorithm to segment the data of each document to get each entry

  • Create a table, each row of data includes information such as the entry, the document id where the entry is located, and the location

  • Because of the uniqueness of the entry, you can create an index for the entry, such as a hash table structure index

As shown in the picture:

The search process of the inverted index is as follows (take the search for "Huawei mobile phone" as an example):

1) The user enters criteria "华为手机"to search.

2) Segment the content input by the user to get the entries: 华为, 手机.

3) Take the entry and search in the inverted index, and you can get the document ids that contain the entry: 1, 2, and 3.

4) Take the document id to find the specific document in the forward index.

As shown in the picture:

Although the inverted index needs to be queried first, and then the inverted index is queried, but both entries and document ids have been indexed, and the query speed is very fast! No full table scan is required.

1.2.3. Forward and reverse

So why is one called a forward index and the other an inverted index?

  • Forward indexing is the most traditional way of indexing by id. However, when querying based on terms, you must first obtain each document one by one, and then determine whether the document contains the required term, which is the process of finding terms based on the document .

  • The inverted index is the opposite. It first finds the entry that the user wants to search, obtains the id of the document that protects the entry according to the entry, and then obtains the document according to the id. It is the process of finding documents based on entries .

Is it just the other way around?

So what are the pros and cons of both approaches?

forward index :

  • advantage:

    • Indexes can be created for multiple fields

    • Searching and sorting based on index fields is very fast

  • shortcoming:

    • When searching based on non-indexed fields or some terms in indexed fields, only full table scans can be performed.

Inverted index :

  • advantage:

    • When searching according to terms and fuzzy search, the speed is very fast

  • shortcoming:

    • Indexes can only be created for terms, not fields

    • Can't sort by field

Some concepts of 1.3.es

There are many unique concepts in elasticsearch, which are slightly different from mysql, but there are also similarities.

1.3.1. Documents and fields

Elasticsearch is document- oriented storage, which can be a piece of commodity data or an order information in the database. Document data will be serialized into json format and stored in elasticsearch:

Json documents often contain many fields (Field) , which are similar to columns in a database.

1.3.2. Indexes and Mappings

Index (Index) is a collection of documents of the same type.

For example:

  • All user documents can be organized together, called the user index;

  • The documents of all commodities can be organized together, called commodity index;

  • The documents of all orders can be organized together, called the index of orders;

Therefore, we can think of indexes as tables in a database.

The table of the database will have constraint information, which is used to define the structure of the table, the name and type of the field and other information. Therefore, there is mapping in the index library , which is the field constraint information of the documents in the index, similar to the structural constraints of the table.

1.3.3.mysql与elasticsearch

Let's compare the concepts of mysql and elasticsearch in a unified way:

MySQL Elasticsearch illustrate
Table Index An index is a collection of documents, similar to a table in a database
Row Document Document (Document), is a piece of data, similar to the row (Row) in the database, the document is in JSON format
Column Field Field (Field) is a field in a JSON document, similar to a column (Column) in a database
Schema Mapping Mappings are constraints on documents in an index, such as field type constraints. Database-like table structure (Schema)
SQL DSL DSL is a JSON-style request statement provided by elasticsearch, which is used to operate elasticsearch and implement CRUD

Does it mean that we no longer need mysql after learning elasticsearch?

Not really, both have their own pros and cons:

  • Mysql: Good at transaction type operations, can ensure data security and consistency

  • Elasticsearch: Good at searching, analyzing, and computing massive amounts of data

Therefore, in enterprises, the two are often used in combination:

  • For write operations with high security requirements, use mysql to implement

  • For search requirements that require high query performance, use elasticsearch to achieve

  • The two are based on a certain method to achieve data synchronization and ensure consistency

Guess you like

Origin blog.csdn.net/weixin_45481821/article/details/131624356