Real-time one million product data synchronization, the query results in seconds

A while back, the boss arranged for a new mission, to build a business product search system, can provide users with fast, accurate search capabilities, user input when searching for content to from the business name and product name to search two dimensions, Search out the results, according to the accuracy of ordering, association and press businesses owned goods to a combination of data structures, while providing API calls to the business system.

Background is very simple, the reality is quite complicated! We are faced with the following challenges:
① commodity business database and database are multiple different servers, and the data capacity of one million, how can we achieve cross-database data synchronization it?

② commodity merchants and data is affiliation, or will the spicy chicken burger McDonald's to KFC to hang, which is embarrassing!

③ commodity business data is updated frequently, such as to modify the price, inventory, up and down and so, the search service that can not be seized bunch of outdated data, if the customer obviously found out the goods, the latter point already into the frame, then customers will Tucao! How to search database CRUD source data are real-time synchronization of it?

With the above three issues, we began a search for overall architectural design services.

System architecture design ideas

In order to design the right system architecture, we analyzed the current situation.
First, the business data and product data are stored in two separate MySQL8 database, in order to meet the associated business data and product data, we need to table two libraries needed for real-time ETL to our database search system.

Secondly, the data from the merchant, commodity database ETL system to search the database, you need to become a combination of real-time business associate product data structure and format of the document father and son, it is stored in the ES.

Finally, merchants, commodity database CRUD operations, you need real-time synchronized to the ES, the ES data that is in the need to support real-time additions, deletions and modifications.

To this end, we designed two canal component, the first canal for data ETL, the merchants, some of the tables and fields commodities database, extract the search service database; recycling the second canal, read MySQL database search services of the binlog, kafka real-time transmission to the message queue, then the data associated with the canal adapter, parent-child document mapping, the processed data is stored in ElasticSearch.

DETAILED system architecture design as shown in FIG.

Merchant product search system architecture design


Project combat

1, and the software environment

Operating System: CentOS 7
canal:canal.adapter-1.1.4,canal.deployer-1.1.4
Kafka: kafka_2.12-2.3.0
ElasticSearch:elasticsearch-6.3.2
kibana: kibana-6.3.2

2, implemented using data ETL Canal to MySQL8

This step is the use of canal from two separate MySQL8 database, extract tables need to search the MySQL database.

2.1 Installation canaldeployer

(1) extracting canal.deployer-1.1.4.tar.gz
(2) Configuration canal deployer
Enter canaldeployer / conf directory, modify canal.properties file, main configuration serverMode, MQ and destination of three parts.
First of all, we serverMode revised to kafka mode, increasing the buffering capacity of the system and improve system stability:

server mode


Next, the configuration of MQ kafka information (kafka your own installation):

kafka MQ information


Finally, the configuration needs to instantiate instance, where the configuration of three, expressed canal deploy three starts this instance, the synchronization of MySQL binlog into the topic kafka. As shown below:

Examples of destinations arranged


(3)配置canal deployer instance
Enter canaldeployer / conf / example directory, found a instance.properties file, which is an example to the canal, we can refer to its configuration.
① example we copy the entire catalog, named both one destination configured on a step, such as xxxsearch;
② enter xxxsearch directory, edit instance.properties file, the main source of the configuration database information required data tables and fields, as well as the topic name specified kafka, binlog this source database will be converted to json data, and real-time transmission through the canal deployer to kafka in the topic. As follows:

Source database configuration canaldeploy instance


canaldeploy instance kafka topic配置


③ into the canaldeployer / bin directory, ./startup.sh performed, and the respective starting canal deployer examples.
Thus canal deployer completed structures.

2.2 Installation canal.adapter

We need to use canal.adapter kafka binlog json data to the topic, cleaned conversion operation, the stored MySQL8. Since the canal is not native support MySQL8, so we need to make some adjustments.
(1) increasing the driving connection MySQL8
Extracting canal.adapter-1.1.4.tar.gz, enter canaladapter / lib directory, remove mysql-connector-java-5.1.40.jar, introduced mysql-connector-java-8.0.18.jar

(2) Configuration canal adapter, so that data is output to MySQL8.
Enter canaladapter / conf directory, edit application.yml file, main configuration consumption kafka, the source database information and database information search system, as follows:

ETL configuration to MySQL8


Then, enter canaladapter / conf / rdb directory to mytest_user.yml official as an example, the configuration kafka topic name, the name of the source database, data source table name, and the name of the target database and target data table name, recommended a table corresponds to a yml file .

ETL mapping configuration table structure


(3) Start canaladapter
Enter canaladapter / bin directory, execute ./startup.sh, start canal adapter, observation logs / adapter / adapter.log log file, manually add a record in the database search system, to see whether it will print the following logs, that there are two records , a INFO, a DEBUG, the configuration is successful.
canaladapter log
So far, data ETL stage set up is complete, the data from two different MySQL8 database, real-time synchronization to the MySQL database search service.

3, data associated with multi-table, and his son Document Map

(1) Configuration of the second canal canaladapter
Enter canaladapter / conf directory, edit application.yml file, main configuration consumption kafka, database search system, ES and connection information, as shown below:

canaladapter MQ configuration and mysql


canaladapter ES configuration


(2) configure multiple correlation table
Enter canaladapter / conf / es directory, vim mytest_user.yml, associate editor multi-table configurations:

Multi-table associated configuration

Note that, sql support multi-table associated with the free combination, but there are some limitations:
(A) the main table can not be sub-query
(B) That can only use the left outer join left-most table must be a primary table
(C) if it is linked from the table subquery can not have more tables
(D) where there can not be the main sql query (from the table subquery where the conditions can be, but is not recommended and may result in inconsistent data synchronization, such as changing the field content where conditions)
(E) the primary link condition allows only foreign key '=' Analyzing actions may occur in other constants, such as: on a.role_id = b.id and b.statues = 1
(F) must be associated with a condition field in the main query statement such as: on a.role_id = b.id wherein a.role_id b.id must appear in the main or select statement
(G) and sql query attribute value mapping of Elastic Search will be one to one (not supported by select *), such as: select a.id as _id, a.name, a.email as _email from user, where name is mapped to es mapping of the name field, _email will map to _email field mapping, where the alias (if an alias) as the final map field _id here can fill out to _id profile:. _id map.
(3) Configuration and his son documentation
An official of biz_order.yml for example, vim biz_order.yml, father and son configure document mapping:

Father and son mapping configuration file


(4) In ElasticSearch6, create index and parent-child relationship mapping documents
Enter kibana page, click Dev Tools, run the following command to create the index and document mapping father and son:
Establish a parent-child index and document mapping
Wherein, for ES6 and kibana installed, this configuration is not particularly, not described herein.

(5) Start canal adapter
Enter canaladapter / bin directory, execute ./startup.sh, start canal adapter, observation logs / adapter / adapter.log log file, manually add a record in the database search system, to see whether it will print the following logs, such as the print command configures success.
Log Example adapter is configured correctly

4, operating results

We can now be performed by kibana DSL statement to query a look.
We had been increased in the business system a "KFC" store, and then add the "tomato" and "fresh tomato" two commodities in the commodity system, and related items to the "KFC." Then we query "KFC" or "tomato", the following is the result of a query (ES removes the default fields):
DSL query results
Seen from the chart, we can query by business name merchandise, stores and merchandise can also inquire by trade name, and support real-time data additions and deletions to the canal, the ES data will be consistent with the business systems and commercial systems, and data structure contains businesses and corresponding goods, to meet business needs.

5, summary

At this point, based on Canal, kafka, MySQL8, business product search system basic framework ElasticSearch6 technology to build complete. We use real-time access canal deployer merchants, commodity database system MySQL binlog, and sent to kafka, followed by multi-table by the canal adapter associated with the consumption kafka, and binlog json data, document mapping father and son, and finally stored in ES6, for the upper Search service call.
The ultimate success of the search service system on-line, real-time data synchronization, second-level results you achieve the business requirements, the boss said, to add a chicken per person R & D team for the company's one million merchants of goods! Think there is little excitement, hehe ~ ~

Author: Kevin, Pico Technology senior Java engineer, concerned about the "slight Technologies" for more technical dry


Guess you like

Origin juejin.im/post/5e6989faf265da5756326908