Tens of billions of data, millisecond return query optimization

In recent years, the company's business has developed rapidly, and the amount of data has grown explosively, followed by challenges brought about by massive data queries. We need data volumes of one billion or even tens of billions to still be able to The millisecond-level speed returns, so it is obviously inseparable from the help of search engines. Among search engines, ES (ElasticSearch) is undoubtedly the best among them. It has ranked first in the evaluation of DBRanking search engines for many years. It is the first choice of most large companies, so what advantages does it have compared with traditional DB such as MySQL, how is ES data generated, and how to ensure the real-time performance of ES index data when the data reaches PB to better meet the needs of the business.

This article will combine our company's practical experience on ES to talk to you about some ideas on how to build quasi-real-time indexes, hoping to inspire you. The catalog of this article is as follows

  1. Why use a search engine, traditional DB such as MySQL is not good

    • Shortcomings of MySQL
    • Introduction to ES
  2. ES index data construction

  3. PB-level ES quasi-real-time index data construction method

Why use a search engine, traditional DB such as MySQL is not good

Shortcomings of MySQL

The MySQL architecture is inherently not suitable for massive data queries. It is only suitable for massive data storage, but it cannot cope with queries under various complex conditions under massive data. Some people say that adding indexes can avoid full table scanning and improve query speed. Why do you say it is not It is suitable for massive data queries for two reasons:

*1. * Adding indexes can indeed improve the query speed, but adding multiple indexes in MySQL will only select the index with the lowest cost when executing SQL. If no index meets the search conditions, a full table scan will be triggered. And even if you use a combined index, you must comply with the leftmost prefix principle to hit the index, but it is very likely that the index fails to comply with the leftmost prefix principle under various query conditions for massive data, and we know that storage requires cost. Yes, if you add indexes for every situation, take innoDB as an example, every time you add an index, a B+ tree will be created. If it is a large amount of data, it will increase the storage cost a lot. Some people have reported before that The actual content size of a table in their company is only 10G, but the index size is 30G! What a huge cost! So don't think that the more indexes you build, the better.

*2. * Some query conditions cannot be solved by adding indexes to MySQL. For example, I want to query all keywords with "Gree Air Conditioning" in the title. If you write in MySQL, you will write the following code

SELECT * FROM product WHERE title like '%格力空调%'

In this case, no index can be hit, and a full table scan will be triggered, and you can’t expect everyone to lose the product he wants. Space" error, then the SQL statement becomes:

SELECT * FROM product WHERE title like '%格空调%'

In this case, even if you trigger a full table scan, you will not be able to query any products. To sum up, MySQL's query capabilities are indeed limited.

Introduction to ES

Rather than saying that the points listed above are the shortcomings of MySQL, it is better to say that MySQL itself is not designed for massive data queries. There are specializations in the technology industry, and a dedicated search engine is required for massive data queries. Among them, ES is well-deserved. King, it is an open source distributed search and analysis engine based on the Lucene engine, which can provide near real-time query for PB data, and is widely used in full-text search, log analysis, monitoring analysis and other scenarios.

It mainly has the following three characteristics:

  • 轻松支持各种复杂的查询条件: It is a distributed real-time file storage, which will index every field (inverted index), use efficient inverted index, as well as custom scoring, sorting capabilities and rich word segmentation plug-ins, etc., can realize arbitrary complex queries Full-text search requirements under conditions
  • 可扩展性强: It naturally supports distributed storage, realizes distributed horizontal expansion of hundreds or thousands of servers through extremely simple configuration, and easily handles PB-level structured or unstructured data.
  • 高可用,容灾性能好: Through the use of active and standby nodes, as well as automatic detection and recovery of faults, high availability is effectively guaranteed

Let's first understand some important concepts of ES in the form of an analogy with MySQL

picture

It is not difficult to see the following concepts of ES through analogy
: 1. The MySQL database (DataBase) is equivalent to Index (index), a logical collection of data. The main work of ES is to create indexes and query indexes.
2. There will be multiple tables in a database, and the same Index will also have multiple types
3. A table will have multiple rows (Row), and the same Type will also have multiple Documents.
4. Schema specifies the table name, table fields, whether to establish an index, etc. The same Mapping also specifies the processing rules for the Type field, that is, how to establish the index, whether to segment words, word segmentation rules, etc. 5. In MySQL, indexes need to be created
manually . In ES, all fields can be indexed, as long as they are specified in Mapping

So why is the index in ES so efficient that it can achieve second-level results under massive data? It uses a variety of optimization methods, the main reason is that it uses a method called inverted index to generate indexes, which avoids full document scanning, so what is an inverted index? Search keywords and other data through documents We call it a forward index. On the other hand, we call it an inverted index to find documents by keywords. Suppose there are the following three documents (Document)

picture

To find the documents containing comming in it, if you want to index forward, you need to take out the content of each document to find out whether there is this word. There is no doubt that this will lead to a full table scan, so how to find it with the inverted index Well, it first divides the content of each document into words, lowercases, etc., and then establishes a mapping relationship between each word and the document containing this word. If there are multiple documents containing this word, then it will be in order of importance The weight of the document (usually using TF-IDF to score the document) sorts the document, so we can get the following relationship

picture

In this case, if we want to find all the documents with comming, we only need to check once, and in this case, the performance of querying multiple words is also very good, just query the list of documents corresponding to multiple conditions, and then take the intersection. Greatly improved query efficiency.

Voice-over : Some processes are simplified here. In fact, words must be located according to the word list first. However, these processes are very fast and can be ignored. Interested readers can refer to relevant information to learn more.

In addition to the inverted index, the distributed architecture of ES is also naturally suitable for massive data queries. Let's take a look at the architecture of ES

picture

An ES cluster is composed of multiple node nodes, and each index exists on multiple node nodes with the data of shards (Shard, index subset). In this way, when a query request comes in, the corresponding query is performed on each node. After the results are integrated, the query pressure is distributed to multiple nodes, avoiding the shortage of CPU, disk, memory and other processing capabilities of a single node.

In addition, when a new node joins, it will automatically migrate some shards to the new node to achieve load balancing. This function is automatically completed by ES. Compared with a sub-database and table under MySQL, developers need to introduce middleware such as Mycat and specify sub-databases Is cumbersome processes such as sub-table rules a huge improvement? This also means that ES has a very powerful horizontal expansion capability. The cluster can easily expand to hundreds or thousands of nodes, and easily support PB-level data queries.

Of course, the power of ES does not stop there. It also adopts such as active and standby shards to improve search throughput, uses node failure detection, and Raft master selection mechanism to improve disaster recovery capabilities, etc. These are not the focus of this article, and readers can refer to them by themselves. In short, after the above brief summary, you only need to understand one thing: ES's distributed architecture design naturally supports massive data queries .

So how is the index data (index) of ES generated? Next, let’s take a look at the key points of this article.

How to build an ES index

To build ES index data, you must first have a data source. Generally, we will use MySQL as the data source. You can directly fetch data from MySQL and then write it into ES. However, this method may directly call the online database query. It will affect online business, for example, consider such a scenario:

The most commonly used business scenario in the e-commerce APP must be that the user enters keywords to query the corresponding product, so what information will the product have, and a product will have multiple sku (sku refers to the same product with different specifications Category, such as Apple mobile phone has iPhone 6, iPhone 6s, etc.), there will be its basic attributes such as price, title, etc., and the product will have classification (home, clothing, etc.), brand, inventory, etc. In order to ensure the rationality of the table design, we Several tables will be designed to store these attributes. Suppose there are product_sku (sku table), product_property (basic attribute table), sku_stock (stock table), and product_category (category table). In order to display all For this information, these tables must be joined, and then written into ES, so that all product information will be obtained in ES when querying.

picture

Since this solution directly executes the join operation in MySQL, it will have a great impact on the performance of the online DB service when the product reaches tens of millions, so it is obviously not feasible. Then how to generate an intermediate table? Since it is directly in MySQL The middle operation is not feasible, can you synchronize the data in MySQL to another place and then do the operation of generating the intermediate table, that is, add an intermediate layer for processing, so as to avoid the direct operation of the online DB, speaking of this I believe that everyone will have a further understanding of the famous saying in the computer industry: there is nothing that cannot be solved by adding an intermediate layer. If there is, then add another layer.

This middle layer is hive

what is hive

Hive is a Hadoop-based data warehouse tool for data extraction, transformation, and loading. It is a mechanism that can store, query, and analyze large-scale data stored in Hadoop. Its meaning is to write well-written hive sql is converted into a complex and difficult-to-write map-reduce program (map-reduce is a parallel computing programming model specially used for large-scale data sets (greater than 1TB), that is to say, if the amount of data is large, the MySQL data Synchronize to hive, and then use hive to generate the above product_tmp intermediate table, which can greatly improve performance. The temporary table generated by hive is stored in hbase (a distributed, column-oriented open source database). After generation, the dump task will be triggered periodically to call the index program, and then the index program mainly reads in the full amount of data from hbase, performs business data processing, and Refresh to the es index, the whole process is as follows

picture

Building an index in this way seems beautiful, but what we need to know is that it is very time-consuming for hive to execute the join task. In our production scenario, due to the amount of data reaching tens of millions, it usually takes tens of minutes to execute the join task. The entire process from executing the join task to finally updating to ES usually takes at least half an hour. If important fields such as the price, inventory, and online status (such as being removed from the shelf) of the product are changed during this period, the index cannot be updated, which will affect the The user experience has a great impact. Before optimization, we often see that some of the products searched through ES are online but actually off the shelf, which seriously affects the user experience. So how to solve it? There is a feasible solution: establish a broadband surface

Now that we find that hive join is the main bottleneck of performance, can we avoid this process? Can we combine product_sku, product_property, sku_stock and other tables into a large table in MySQL (we call it a wide table)

picture

In this way, the data related to the commodity in each row is available, so after synchronizing MySQL to hive, hive does not need to perform time-consuming join operations, which greatly improves the overall processing time. Synchronizing MySQL from hive Then the dump to ES index has been reduced from more than half an hour to less than a few minutes. It looks really good, but the index delay of a few minutes is still unacceptable.

Why can't hive import indexes in real time

Because hive is built on top of static batch-based Hadoop, Hadoop usually has high latency and requires a lot of overhead in job submission and scheduling. Therefore, hive cannot implement low-latency and fast query operations on large-scale data sets, and the full amount of tens of millions of data is imported from the index program to the ES cluster at least in minutes.

In addition, the wide table is introduced, and its maintenance has become a new problem. Imagine that the sku inventory has changed, the product has been removed, and the price has been adjusted. In addition to modifying the original table (sku_stock, product_categry, etc.) All records changed from the original table correspond to all the records in the wide table. This is a nightmare for code maintenance, because you need to change the logic of the wide table immediately after all product-related table changes, and The change logic of the wide table is tightly coupled!

PB-level ES quasi-real-time index construction method

How to solve it? Carefully observe the above two problems, they are actually the same problem. If we can monitor the field changes of db in real time, and then synchronize the changed content to ES and wide table in real time, our problem will be solved.

How can we monitor the changes of table fields in real time?

Answer: binlog

Let's review the master-slave synchronization principle of MySQL

picture

  1. MySQL master writes data changes to the binary log (binary log, where the records are called binary log events, which can be viewed through show binlog events)
  2. MySQL slave copies the master's binary log events to its relay log (relay log)
  3. MySQL slave replays events in the relay log, reflecting data changes to its own data

It can be seen that the key to the principle of master-slave replication is that Master and Slave follow a set of protocols to monitor binlog logs in real time to update slave table data. Can we also develop a component that follows this protocol, and when the component is used as a Slave to obtain What about binlog logs and real-time monitoring of table field changes? Ali's open source project Canal does just that, and its working principle is as follows:

  • canal simulates the interactive protocol of MySQL slave, pretends to be MySQL slave, and sends dump protocol to MySQL master
  • MySQL master receives dump request and starts pushing binary log to slave (ie canal)
  • canal parse binary log object (originally byte stream)

picture

In this way, the binlog log can be obtained through the canal. Of course, the canal only obtains the binlog received from the master, and it also needs to analyze and filter the binlog. In addition, if we are only interested in the fields of certain tables, how to configure and obtain Who will pass it to the binlog? All of these require a unified management component, and Ali's otter is for this.

what is otter

Otter is a distributed database synchronization system provided by Ali based on database incremental log analysis, which can be synchronized to the MySQL database in the local computer room or remote computer room in quasi-real time. Its overall structure is as follows:

picture

Note: The above is our company's modified business structure based on otter, which is slightly different from the original otter, but similar

The main workflow is as follows

  1. Configure zk in the Manager, the table to be monitored, and the node responsible for monitoring the table, and then synchronize the configuration to Nodes
  2. After the node starts, its canal will monitor the binlog, and then send the data to MQ after four stages of S (select), E (extract), T (transform), and L (load)
  3. Then the business can subscribe to the MQ message to do related logic processing

Voiceover: Zookeeper mainly coordinates the work between nodes. For example, when synchronizing data across computer rooms, it may be necessary to synchronize data from nodes in computer room A to nodes in computer room B. Zookeeper must be used to coordinate.

You should have noticed that there are four stages in node: S, E, T, and L, and their main functions are as follows

picture

  • Select 阶段: In order to solve the differences in data sources, such as accessing canal to obtain incremental data, or accessing other systems to obtain other data, etc.
  • Extract阶段: Assemble data, perform data assembly and filtering for various data sources, such as mysql, oracle, store, file, etc.
  • Transform 阶段: Data extraction and transformation process, converting data into the type required by the target data source
  • Load 阶段: Data loading, loading data to the target end, such as writing to the migrated database, MQ, ES, etc.

The above set of data services based on Ali otter's transformation we call it DTS (Data Transfer Service), that is, data transfer service.

After building this set of services, we can write to ES in real time by subscribing to MQ so that the index can be updated in real time, and we can also update the fields of the wide table by subscribing to MQ, which solves the problem of updating the fields of the wide table and the original table mentioned above. For the problem of tight coupling, the index improvement architecture based on DTS service is as follows:

picture

Note: The " build data " module is transparent to real-time index updates. This module is mainly used to update or insert MySQL wide tables, because for wide tables, it is the union of several table data, so it is not monitoring It will update whichever field changes, and it will pull back all the table data related to all products and update them into the wide table.

Therefore, through the full update of MySQL wide tables + real-time index update based on DTS, we have solved the problem of index delay very well, and can achieve ES index update at the second level!

Here are a few issues that you may be more concerned about, I will briefly list them

Which fields need to be subscribed

For the MySQL wide table, because it needs to save the complete information of the product, it needs to subscribe to all the fields, but for the real-time index update in the red box, it only needs to subscribe to the inventory, price and other fields, because if these fields are not timely Updates will have a great impact on sales, so our real-time indexing only focuses on these sensitive fields.

With real-time index updates, do we still need full index updates?

Needed , mainly for two reasons:

  • Real-time updates rely on the message mechanism, which cannot guarantee 100% data integrity. Full updates are required to support it. This is rarely the case, and there will be alarms for message backlogs, so we will only perform full index updates once a day.
  • Quickly rebuild the index after the index cluster is abnormal or crashed

Will the updated data of the full index cover the real-time index?

Yes, imagine a scenario where you trigger real-time indexing at a certain moment, and then the full index is still being executed at this time, and the record that the real-time index is updated has not yet been executed. In this case, after the full index is executed, the The previous real-time index update data will be overwritten. A feasible way to deal with this situation is that if the full index is being built, the real-time index update message can be delayed and consumed after the full update is complete. It is also for this reason that we generally execute full indexing in the early morning. Since it is a low-peak business period, it is possible to avoid such problems to the greatest extent.

Summarize

This article briefly summarizes some of our company's ideas for building real-time ES indexes under PB-level data. I hope it will be helpful to everyone. The article only briefly mentioned the application of Ali middleware such as ES, canal, otter, etc., but did not discuss the application of these middleware. Detailed configuration, principles, etc. have not been introduced too much. The design of these middleware is very worthy of our study. For example, ES has done a lot of work to improve search efficiency and optimize storage space. The principle of computer room synchronization, etc., I suggest that interested readers can study it later, I believe you will benefit a lot.

shoulders of giants

  • Introduction to Elasticsearch and comparison with MySQL query principle: https://www.jianshu.com/p/116cdf5836f2
  • https://www.cnblogs.com/zhjh256/p/9261725.html
  • otter-node installation of otter installation (single-machine multi-node installation): https://blog.csdn.net/u014642915/article/details/96500957
  • Comparative analysis of MySQL and Lucene indexes: https://developer.aliyun.com/article/50481
  • 10 minutes to quickly get started Massive data search and analysis engine Elasticearch: https://www.modb.pro/db/29806
  • Analysis and comparison of ElasticSearch and Mysql query principles: https://www.pianshen.com/article/4254917942/
  • Take you into the God-like Elasticsearch indexing mechanism: https://zhuanlan.zhihu.com/p/137574234

Guess you like

Origin blog.csdn.net/agonie201218/article/details/129235501