A brief discussion on search presentation layer scenario technology-tanGo practice

Author | Search Technology Platform

Introduction 

This article is about technologies related to the search presentation layer. The main line will first introduce the product form of Search Aladdin to let readers have a preliminary understanding of what Aladdin is and related presentation concepts. Later, we will focus on scenario-based products. Scenario-based search is one of the solutions to build an immersive and perfect experience (recombine the full-page Aladdin and natural results). There are many related search technologies and cannot be fully covered. This article mainly introduces the development framework behind it. Support tanGo and will introduce in detail the thinking, problems encountered and corresponding solutions during the construction process. I hope readers will gain something from reading this article.

The full text is 4412 words and the estimated reading time is 12 minutes.

01 Related background introduction

Understand what Aladdin is. Aladdin is a vertical product launched by Baidu Search. During the search process, users mentioned Query requirements such as encyclopedia, weather, POI, film and television, sports, stocks, Chinese, translation, etc. (as follows) Examples of some products) are recalling Aladdin products to varying degrees.

picture

The above is the search pattern for meeting single requirements and demand clusters (aggregation of multiple single results for the same requirement). For more complex scenarios, such as college entrance examinations, Olympics and other major event scenarios, the search needs to be able to identify the scenario and then recall different scenarios. Requirements cluster.

picture

In addition, some businesses in the technical vertical category are also migrating PHP to Go. Under such a product and technology background, the search industry and research team incubated and designed the tanGo business framework. The following will be from demand analysis -> business abstraction -> overall design and core capabilities. Focus on abstract logic and other aspects, and introduce some problems and thoughts encountered during the construction and application of the entire framework. Prior to this, the Group and Search had also accumulated some relatively mature basic network frameworks and lib (including cgo) assets. Based on these foundations, a lot of efficiency was also improved during the actual implementation process. A business framework should answer well: Why do you do it? (solve business problems) How to do it? How to measure (design, implementation)? (Indicator construction), the specific practices are discussed below.

02 Requirements analysis

From the perspective of search concepts, let’s understand the different scenarios that scenario-based products need to satisfy:

1. Single result: recall one result and search for the basic unit of recall. For example: resource recall. Check movies, TV shows, novels, etc.; computational recall. Calendar, calculator, etc.

2. Requirement cluster: aggregation of multiple single results of the same requirement. For example: the encyclopedia, character relationships, and works in the character example "Andy Lau".

3. Requirement group: a collection of multiple requirement clusters. For example: the college entrance examination in major event scenarios (main venue, college entrance examination schedule, batch line, real questions, one point and one section), each is an independent demand cluster demand, as are other big scenes such as the Olympic Games.

4. Result page: The overall return result corresponding to the query. May contain multiple requirements, that is, multiple requirement groups. (For example, q=Sanya Tourism, which includes both tourism demand and financial demand).

03 tanGo design ideas and practice

3.1 Business abstraction

3.1.1 Scenario-based product abstraction

From the perspective of presentation technology, demand clusters and demand groups are both collections of single results, with different latitude and granularity. The overall abstraction of the retrieval request processing process is based on several stages, and each stage is materialized and abstracted.

1. Resources: The unit of representation of recall abstracts. The resources here may be abstracts retrieved from various search engines or databases.

  • Resource entity abstraction: pre-strategy (resources), retrieval analysis (resources), retrieval recall, data mapping

2. Card: The smallest unit of search display. Complete resource scheduling recall and assemble summary to front-end template mapping

  • Card entity abstraction: pre-positioning strategy (card), retrieval and analysis (card), resource recall scheduling (responsible for graphical scheduling of the resource list under the card), front-end template assembly

3. Scenario: The retrieval needs to identify the sub-scenario to be displayed based on query analysis and complete the recall of different demand groups.

  • Scheduling layer: request level

  • Scene entity abstraction: request pre-strategy, Q analysis scenario calculation, card scheduling (responsible for executing card collection graphical scheduling under demand clusters), recall post-strategy, and organize return packets

3.1.2 Technical ideas for framework construction

Business process processing standardization, core considerations in the processing process

1. Protocol conversion

  • Support multiple protocols such as http/nshead

  • Supports conversion of multiple data protocols such as pb/json

  • Consider synchronous and asynchronous retrieval, and the ability covers development scenarios such as result pages, asynchronous scenario pages, small programs, independent stations, etc.

2. Configuration: considering operation and maintenance costs, retrieval request configuration and visualization

3. Componentization and operatorization: to facilitate subsequent joint construction

4. Graphical resource scheduling: card scheduling, resource scheduling

Establish a standardized component co-construction mechanism

1. Define standardized data, retrieval, and strategy component interfaces

2. Component contribution mechanism

Establish a standardized class library co-construction mechanism

1. Standard Lib contribution standard. For example: sampling, DAG, Trace, algorithm, string, protocol conversion, encryption and decryption, etc.

2. Standard Lib index page

Other key points

1. Development stage: one-click generation, visual programming, and user manuals to improve R&D efficiency

2. Testing phase: compilation acceleration, QATest, level 0 interception and other guaranteed delivery

3. After going online: Supporting monitoring system: building Prometheus, business, downstream, retrieval and scheduling panels

3.1.3 Metrics

Size :

  • Application scale, team coverage

Efficiency :

  • New project creation cost

  • New product delivery cycle

  • New employee training and learning costs

  • The number of common components, Lib, and the savings in lines of code brought about by component precipitation

  • Team efficiency, team delivery efficiency improvement feedback

User satisfaction :

  • NPS, regular user satisfaction feedback

3.2 Framework technology block diagram

Based on the previous abstraction of business scenarios and technical abstraction, the following framework construction block diagram was constructed. Main core points:

1. Ease of use: end-to-end tool chain creation

2. Framework hierarchical structure: business process, components, Lib

3. Business process: synchronous retrieval, asynchronous retrieval, data processing

picture

3.3 Core point design

3.3.1 Search process design

Design goals:

A set of standard processing procedures, abstract retrieval stages

  • Request level processing

  • card level processing

  • Resource level processing

picture

The above figure shows the processing process of retrieval requests. Each stage is organized in the form of components. Components are jointly developed by architecture and business students. Different types of components will be scheduled at each stage of retrieval. Business students can focus on the research and development of domain components and other protocols. The R&D of packaging conversion framework is provided and interconnected in a unified manner.

3.3.2 Configuration design

Design goals:

  • The retrieval process needs to be described in one configuration

  • To control learning costs, the grammar must be concise and simple

Key technical points:

  • Process topology abstraction: three-layer topology (policy, card, resource)

  • Component management: To manage the life cycle of components and be GC-friendly at the same time, it uses native capabilities such as go reflection and object pooling.

  • Configuration hot loading: realize dynamic update of configuration

picture

The above figure is a concrete example of component configuration at different stages. The advantages brought by configuration are: the retrieval process is transparent and visible; the operation and maintenance cost is controllable; and the learning and acceptance costs are low.

3.3.3 Resource scheduling design

From the scenario processing process and configuration introduced earlier, we can see that card scheduling in the scene calculation process and resource scheduling in the card processing process all use serial and parallel topology scheduling because a simple DAG execution needs to be designed for the framework. engine.

Design goals:

1. Based on DAG, design a set of simple grammar rules to implement a resource scheduling engine

2. Capture and record error information for program exceptions, timeouts, etc.

Key technical points:

1. Design a set of DAG rule grammar that is simple and meets the needs

  • Basic process control: serial, parallel, conditional control syntax

  • Exception control: capture processing of program exceptions, timeouts, etc.

  • Simple syntax and low cost of getting started

2. DAG scheduling engine: Execute graph scheduling based on configured DAG rules

picture

From the figure, you can see that during the processing of retrieval requests, scenes and cards are scheduled graphically according to the needs of user scenarios to realize user recall and response to complex needs. Components communicate in series through context.

3.3.4 Creating a co-construction mechanism

When designing the retrieval process, we consider the sustainability and scalability of the framework and abstract the concept of components. In the above retrieval process processing, various processing components are scheduled at different stages to respond to retrieval, including request level, card level, Resource level, these are scalable scheduling phases.

picture

The overall idea is divided into two types of components as shown in the figure. The architecture and business are jointly constructed to maximize reusability and business scalability.

1. Architectural components, common scenarios, unified abstraction of architecture, maximizing reusability between vertical classes

2. Business components, components customized by businesses according to their own business scenarios

3.3.5 Built for ease of use

After completing the construction of the framework, we are faced with how to make it closer to the business and how to make it easier to use? Below is the construction in the direction of ease of use.

Ease of use is a very important step for implementation and scale-up. It is necessary to look at the entire delivery process from an end-to-end perspective from a front-line R&D perspective, and then build tool chains to address issues at each stage to improve operational efficiency.

picture

The above picture shows different stages of research and development. Part of the support provided by the framework team is to ensure access and development efficiency on the one hand, and also actively collect feedback on a routine basis for better improvements.

04 Conclusion and outlook

This article starts from the introduction of the search Aladdin product form, extends to the scenario, and abstracts the tanGo framework based on the characteristics of the search scenario product. Next, the focus system shared the technical design ideas and practices of the tanGo framework in detail, and expressed the core thinking and core design points as much as possible. The space cannot cover all the design points, and the framework still has shortcomings and deficiencies, which will be continued in the future. of optimization iterations.

In the future, the framework will focus more on the "whole process" of production and research. By improving the capabilities of the framework, it will provide more comprehensive support and coverage for the entire research and development, testing phase, and after launch. For example, through the integration of the framework with code hosting services, the framework can provide more comprehensive support and coverage when creating code. The library stage improves ease of use, standardizes compilation and release, integrates the framework with continuous integration services, improves testability after launch, etc., and covers the entire R&D process more comprehensively.

picture

There are still many details in each stage of a project from requirement construction to project implementation, to application and scale-up. If you are interested in searching for Aladdin products or displaying technical issues, you can leave a message to communicate.

Currently, the position of "Search Product R&D Engineer" is hotly recruited, mainly working in the back-end of search production and research, AI application and architecture.

Interested students are welcome to submit their resumes to [email protected]

——END——

Recommended reading

First introduction to search: Baidu search product manager’s first lesson

Application of intelligent question and answer technology in Baidu search

Support OC code reconstruction practice through Python script (1): module calling relationship analysis

CVPR2023 Excellent Paper | Analysis of the Problem of Lack of Generalization in AIGC Forgery Image Identification Algorithm

Complete the design and development of exclusive codes in one article

Alibaba Cloud suffered a serious failure and all products were affected (restored). Tumblr cooled down the Russian operating system Aurora OS 5.0. New UI unveiled Delphi 12 & C++ Builder 12, RAD Studio 12. Many Internet companies urgently recruit Hongmeng programmers. UNIX time is about to enter the 1.7 billion era (already entered). Meituan recruits troops and plans to develop the Hongmeng system App. Amazon develops a Linux-based operating system to get rid of Android's dependence on .NET 8 on Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4939618/blog/10139947