10 years +, what kind of search engine has Ali precipitated? (2019-09-24)

1. Overall structure

The search engine is divided into data source aggregation (commonly known as dump), full/incremental/real-time index construction and online services, etc., with Tisplus as the entry via Bahamut (Maat for workflow scheduling)->Blink->Hdfs/Swift->BuildService ->Ha3->SP->SW and other stages provide high-availability/high-performance search services to customers. The data source aggregation is completed on the tisplus platform and the Blink platform, the Build service and Ha3 are completed on the suez platform, and the SP and SW are deployed through drogo. The specific structure diagram is as follows:

Two, Tisplus

1688 currently has spu, cspu, company, buyoffer, feed and other engines and offers offline in tisplus operation and maintenance. The platform mainly builds and maintains ha3 and sp. The general architecture is as follows:

Occasionally, in daily maintenance, the problem of data source output failure is encountered, mainly due to the expiration of data source table permissions and zk jitter. In terms of performance, after the introduction of the Blink Batch model by the search center team within the group, the dump execution time was shortened. The specific indicators are as follows (take the buyoffer engine as an example):

On the tisplus platform, the entry of offline dump is as follows:

DAG data source graph example:

The following mainly talks about the offline dump data source processing process, including Bahamut, Maat and data output.

2.1 Bahamut - data source graph processing

Bahamut is a component platform for offline data source processing, which translates the data graph spliced ​​on the web side into executable SQL statements through jobManager. There are currently four categories of components included in Bahamut, namely:

  1. Data input: datasource (supports tddl and odps)

  2. KV input: HbaseKV (Hbase data sheet)

  3. Data processing: Rename (data field renaming), DimTrans (using 1-to-many data aggregation), Functions (simple field processing), Selector (field selection), UDTF (data logic processing), Merge (data source aggregation), Join (left join)

  4. Data output: Ha3 (Hdfs/swift)

The processing process of the data source is described as follows:

by Jingming

And for the Bahamut->blink process can be stated as follows:

Among them, Bahamut disassembles the task and throws it to the JobManager for conversion from logical nodes to physical nodes. After forming several nodes, they are merged and combined into a complete SQL statement. For example, Kratos_SQL in the figure above is a complete SQL of an incremental Join, with the resource file. Submit tasks together through BayesSDK. In addition, the platform adds a weak personalized configuration function, which can control the concurrency, node memory, cpu and other parameters of a specific task through personalized configuration.

2.2 Maat - Distributed Process Scheduling System

Maat is a distributed process scheduling system developed again based on the open source project Airflow. It has the advantages of visual editing and common node types, Drogo deployment, cluster management and perfect monitoring & alarm mechanism.

Regarding Airflow and other workflow systems, the comparison is shown as follows:

Take the eed engine as an example, the maat scheduling page is as follows:

When the task is wrong, you can use this page to "fail the specified step" and then rerun the full task, or you can learn the reason for the failure of the task by viewing the log of a certain step.

2.3 Ha3 doc - data output

After the above steps, the data is finally output in the form of xml (isearch format) to the HDFS/Pangu path (full) and Swift Topic (incremental). Get incremental update messages from the swift topic to update the engine. The offline platform provides functions such as table information query for the Tisplus engine module through a service. The following is the information contained in an HA3 table:

{
 "1649992010": [
   {
     "data": "hdfs://xxx/search4test_st3_7u/full", // hdfs路径
     "swift_start_timestamp": "1531271322", //描述了今天增量的时间起点
     "swift_topic": "bahamut_ha3_topic_search4test_st3_7u_1",
     "swift_zk": "zfs://xxx/swift/swift_hippo_et2",
     "table_name": "search4test_st3_7u", // HA3 table name,目前与应用名称一样
     "version": "20190920090800” // 数据产出的时间
   }
 ]
}

3. Suez

 After the above steps, the data is output to Hdfs and swift in xml (isearchformat) format, and then select the data type as zk in the offline table of the suez_ops platform and configure the corresponding zk_server and zk_path.

Then the Build service completes the construction of the full/incremental/real-time index, and then distributes it to the Ha3 online cluster to provide services.

The offline table construction logic of suez is as follows:

The logic of suez online service is as follows:

The following is a brief description of offline (buildservice) and online (ha3):

3.1 Build Service - Index Build

Build Service (BS for short) is a build system that provides full, incremental, real-time indexing

There are five types of roles in build_service:

  • admin : Responsible for controlling the overall build process, switching the full incremental state, initiating regular tasks, and corresponding user control requests;

  • processor : Responsible for data processing, converting the user's original document into a lightweight buildable document form;

  • builder : responsible for building the index;

  • merger : responsible for index sorting;

  • rtBuilder: Responsible for real-time construction of online indexes.

Among them, admin, processor, builder, and merger are binary programs that run on hippo, and rtBuilder is provided to the online part in the form of lib.

A complete full + incremental process will generate a generationid, and the generation will go through the process of process full-> builder full -> merger full -> process inc -> builder inc ->merger inc. After the inc process, builder inc and merge inc will appear alternately. 1688 The build tooslow problem often occurs before the ha3 upgrade because it is allocated to a bad node or the builderinc/merger inc stage is stuck.

3.2 Ha3 - Online Search Service

Ha3 is a full-text search engine based on the suez framework, providing rich online query clauses, filtering clauses, sorting clauses, aggregation clauses and supports user-defined development of sorting plug-ins. The service architecture is as follows:

The 1688 main search engine consists of a set of Qrs, searchers and summary:

  • The function of Qrs is to parse and verify the input query, and forward the query to the corresponding searcher after passing it; the searcher collects and merges the results returned by the searcher, and finally processes the results and returns them to the user. Among them, you can also interfere with the merge rules by writing meger plugins;

  • searcher: It can be a document recall service (searcher), a document scoring and ranking service (ranker), or a document summary service (summary);

  • summary: The 1688 main search separates the searcher and the summary, and the summary cluster only provides the service of fetching product details.

Machines such as qrs/searcher/summary provide services by being mounted to cm2. For example, qrs has external cm2, which can provide services to callers such as SP, and searcher and summary have internal cm2, which can receive requests from qrs and complete recall sorting and retrieval. Details and other services.

A caller's query service needs to go through qrs->query parsing->seek->filter->rank (coarse ranking)->agg (aggregation)->rerank (fine ranking)->extraRank (final ranking)->merger The process of ->summary (getting details) is described as follows:

Among them, ReRank and ExtraRank are completed by the Hobbit plug-in and the Hobbit-based Warhorse plug-in. The business side can develop Warhorse features according to their own needs and specify the weight of each feature to obtain the final score of the product.

4. Drogo

drogo is a data-free service management and control platform based on the second-tier scheduling service Carbon, on which 1688's SP service and QP proxy service are deployed.

The deployment of the main service platforms of the 1688 search link is briefly described as follows:

Reprinted: 10+ years, what kind of search engine has Ali precipitated? 

Guess you like

Origin blog.csdn.net/yangbindxj/article/details/123912006