User Case | Shopee’s practice of vector retrieval system in multimedia understanding business

Shopee is a global e-commerce platform with business scope covering Southeast Asia, Latin America and other regions. The Multimedia Understanding (MMU) team is a team within Shopee that focuses on providing multimedia content understanding services, providing support for e-commerce, live broadcasts, short videos and other businesses.

The MMU team needs to support the company's needs for multimedia understanding in different business scenarios. Taking vector retrieval as an example, there may be the following business scenarios:

  • Real-time recommendations, such as video recall systems

  • Video supply, such as video original system

  • Video deduplication, such as video fingerprinting system

Therefore, in the vector retrieval scenario of the MMU team, it is necessary to have both the basic engine capabilities of vector recall and the business architecture capabilities in different business scenarios. Since different business scenarios have different requirements for basic engines and business architectures, and various factors such as manpower and time need to be considered when implementing the business, this has brought many challenges to the team in completing the corresponding system construction.

Based on the above background, the content of this article will share the relevant practices of the Shopee MMU team in retrieval business systems and platform construction based on the Milvus engine.

01.Milvus Practice

1.1 Vector search engine

As more and more vector retrieval requirements arise, such as recall and deduplication based on video content, the team needs a universal vector retrieval engine solution to ensure team efficiency and system stability, thereby providing better support for the business.

According to the research results on open source vector search engines in the industry, the advantages of the Milvus engine are more suitable for the needs of the team. Among them, Milvus's cloud-native architecture is more in line with Shopee's internal cloud-native ecology and can quickly support the construction of the retrieval system from 0 to 1. In addition, Milvus has rich features, including distributed, GPU, incremental updates, scalar, etc., which can adapt to business scenarios. Provide effective help for efficient implementation.

After comprehensive consideration, the team chose Milvus as the underlying search engine to build a search business system from scratch. This chapter will provide an overall introduction to the implementation practice of vector retrieval engines.

  • Kite 1.x

The first requirement occurred before Milvus 2.x was released. Since the data has reached a certain scale, single-node Milvus cannot meet the demand, so a distributed solution of Milvus 1.1 + Mishards was adopted.

Figure 1: Milvus 1.x + Mishards architecture

However, in actual business scenarios, as the data scale and request volume increase, the retrieval performance and throughput reach a certain bottleneck and cannot expand with the increase of readonly nodes.

After analysis, the MMU team found the following two reasons:

  1. Mishards' default sharding strategy may lead to an unbalanced number of segments allocated to readonly nodes in some cases.

  2. As the number of readonly nodes increases, each node needs to retrieve Top K, and the amount of data when Mishards performs Reduce will increase dramatically, resulting in higher latency.

The mitigation solution is to deploy multiple Mishards clusters, sharing the same database and S3 bucket. However, the deployment and maintenance costs of this solution are high, and a more suitable deployment solution needs to be found in the long term.

Figure 2: Mishards multi-cluster solution

  • Kite 2.x

After the release of Milvus 2.x, the Milvus engine was gradually upgraded in various business scenarios. Judging from the actual results, the stability and scalability of Milvus 2.x have been greatly improved compared to the Mishards cluster. Especially after the release of Milvus 2.1, its multi-copy capability has further improved the overall performance of the cluster and can basically meet various business scenarios. In addition, based on the cloud native architecture of Milvus 2.x, the introduction cost of logs and monitoring is very low, and it is also more friendly and complete.

Figure 3: Milvus 2.x architecture

Figure 4: Milvus 2.x monitoring (some indicators)

1.2 Milvus deployment solution

  • GitOps

Earlier, the deployment of Milvus cluster was done manually using the helm command ( https://milvus.io/docs/v2.2.x/install_cluster-helm.md ). In order to meet resource isolation, different businesses need to deploy independent Milvus clusters. As business continues to increase, maintenance of deployed clusters has become a new challenge.

In order to solve this problem, the MMU team stored the Chart directory of each business on the same Git and published it to the online K8S cluster using tools such as Jenkins/ArgoCD.

Figure 5: Milvus 2.x GitOps

Currently, Milvus Operator (Install Milvus Cluster with Milvus Operator) has been released, which can further reduce the file content stored in Git and streamline configuration, reducing configuration costs.

  • load balancing

After the Milvus cluster is deployed, the cluster's external traffic entrance is the Milvus Proxy node. If you directly access the Proxy, there will be a single point problem with the Proxy. You need to add a seven-layer load balancing component that supports the GRPC protocol, such as Nginx (Milvus SDK uses a single long connection solution, so four-layer load balancing cannot be used).

Figure 6: Milvus 2.x architecture (quoted from Milvus official documentation)

The above is a partial introduction of the MMU team's application practice in Milvus, for your reference only. The following article will briefly introduce the business architecture built by the team based on the Milvus engine.

02.Business structure

2.1 Real-time retrieval business

The real-time retrieval business serves real-time online requests, and the system returns search results in real time, such as the recommendation recall system. Its main feature is its high requirements on availability and latency.

2.1.1 Video recall

The video recall system is a recall capability based on video content provided by the MMU team for the business. It is one of the recall components of the video recommendation system. The business request obtains the Top K candidates by retrieving Milvus, and returns the recall results after refined ranking logic.

  • Based on Milvus 1.x architecture

Since Milvus 1.x is mainly oriented to data analysis scenarios and has not been specifically optimized for low-latency real-time recall, the latency will decrease to a certain extent after the vector library and request volume reach a certain scale. Therefore, in some scenarios under the Milvus 1.x architecture, the cache TopK+ background update method is used: a real-time query interface is provided through the cache, and the TopK update is completed in the background update data stream.

Figure 7: Similar video recall system (Milvus 1.x architecture)

The core process based on Milvus 1.x architecture is as follows:

  1. Preprocessing module

○ Monitor the video database, filter out the incremental video data required by the system, and hand it over to the incremental storage module

  1. Incremental warehousing module

○ Extract features from incremental videos and add features to KV database and vector retrieval database

○ Use incremental video to query the vector database, and send the retrieved Top N List to the result update module (Top N List will update their respective TopK result caches in subsequent processes)

  1. Result update module

○ Execute complete recall logic (feature extraction, vector retrieval, fine ranking) for each video in the Top N List, and output the results to the TopK cache

  • Architecture based on Milvus 2.x

After the release of Milvus 2.x, due to its powerful distribution capabilities and system performance, the system can directly provide TopK recall capabilities through the query interface of Milvus 2.x.

Figure 8: Similar video recall system (Milvus 2.x architecture)

2.2 Offline retrieval business

The offline retrieval service performs retrieval based on bypass traffic, and writes the final results into storage as features, which is convenient for different services to use according to their respective scenarios, such as video fingerprint systems and video originality systems.

2.2.1 Original video

In order to build a good content ecosystem, the platform hopes to encourage users to produce original content through certain mechanisms. In order for these mechanisms to work better, the system needs to identify the video content and determine whether it is an original work.

The video original system is designed for the above needs. The system identifies the originality of the video through a certain processing process and provides it to the business party for subsequent processing.

  • system structure

Figure 9: Video original system architecture

  • core process
  1. Preprocessing module

○ Filter out the videos required by the system, splice the core video information and output it to the feature extraction module

  1. Feature extraction module

○ Extract features from the video and output them to the business logic module

  1. business logic module

○ Execute TopK retrieval and original logic for videos

○ The results are stored in the database and output to the business

  1. retrace module

○ Bypass the process and implement a cover-up strategy for data that fails to be processed in each link

2.2.2 Video fingerprinting

Similar videos lack freshness and cannot bring valuable consumption experience to users. This has given rise to the need for video deduplication, that is, it is necessary to identify whether a video is a duplicate video, thereby providing deduplication capabilities for various business scenarios.

The video fingerprint system is designed for the above needs. The fingerprint system assigns a fingerprint ID to each video (the fingerprint ID is used as the identifier of the video, and videos with the same ID are regarded as duplicate videos) and outputs it for use by various business parties.

  • system structure

Figure 10: Video fingerprint system architecture

  • core process

The core process of the fingerprint system is similar to the original system, and the main difference lies in the business logic module.

  • business logic module

○ Process videos in batches, perform TopK retrieval, fine sorting, clustering and assign fingerprint IDs

○ Store the results in the database and output them to the business

2.2.3 Overall design

  • Unified data flow
  1. Unified access protocol - summarizes data source input from all parties and provides a unified data flow

  2. Unified preprocessing - aggregate and maintain video availability status, reducing subsequent system processing complexity

  3. Construct addition, deletion and modification semantics to meet the refined needs of different business scenarios

  • Flexible affairs
  1. Retry + idempotent update mechanism to ensure that the output data meets business expectations

  2. Partition routing + local locks improve concurrency efficiency and ensure data accuracy

  • Digging strategy
  1. Local retry policy configuration

  2. Failed data is scanned back regularly

  • Logical orchestration engine

Develop a universal logic orchestration engine, standardize input and output, middleware invocation, AI service invocation and other components to improve the efficiency of policy logic development.

The above is an introduction to some engineering practices of the Shopee MMU team in vector retrieval systems. As the business continues to develop, the team has increasingly strong requirements for R&D efficiency, system quality, and developer experience. In order to improve these aspects, the team initiated corresponding platform construction projects. Next, this article will briefly introduce the team’s platform practice during the development process.

03.Platform practice

3.1 Early R&D model

In the early stages of business and team development, taking the retrieval scenario as an example, the overall R&D process is as follows:

Figure 11: Early R&D model

  • Algorithm R&D Responsibilities
  1. Training models based on business characteristics

  2. Output algorithm SDK as delivered

  • Engineering R&D Responsibilities
  1. Deploy Milvus cluster

  2. Develop and deploy AI online services (such as feature extraction, fine ranking, clustering, etc.) based on the algorithm SDK

  3. According to business needs, develop and deploy several related microservices (such as feature storage, caching services, etc.)

  4. Implement orchestration services based on business needs and connect the entire business process

  5. Connect with business parties, such as providing API, joint debugging, testing, access and other services

With the rapid development of business, the team's requirements for iteration efficiency and quality are constantly increasing, posing challenges to the traditional model. In the early R&D model, there are some links that can be optimized, such as model online services, business logic orchestration, business access, etc.

3.2 Model service platform

AI model atomic services with fixed modes such as vector extraction, fine sorting, and clustering are standardized based on the unified model engine and protocol of the MMU scenario. When performing logical orchestration, the required atomic services can be selected based on business characteristics.

In order to further improve the R&D efficiency and developer experience of atomic services, the construction of a model service R&D platform came into being. The model service R&D platform is built based on the model engine and unified protocol, standardizing development, deployment, testing, operation and other links, and providing one-stop, full-process self-service functions for the entire model service R&D cycle.

In the development process, the platform provides SDK development capabilities in a self-service manner. Algorithm developers can complete the docking of the model SDK based on the unified protocol of the MMU scenario, shielding the engineering details of the model engine and reducing interaction costs and cognitive load.

Figure 12: Model service platform (development)

In the deployment process, the platform connects to various underlying infrastructures, shields platform usage details and dependencies, unifies the differences between CPU and GPU deployment platforms, and provides users with unified, one-click deployment functions and usage experience.

Figure 13: Model service platform (deployment)

In the testing process, the platform is based on the model engine and unified protocol, providing universal SDK self-testing, functional testing, performance testing, service debugging and other testing functions. Through automation and one-click method, various tests of atomic services can be completed efficiently.

Figure 14: Model service platform (test)

In the operation process, the platform provides unified logging, monitoring, and alarm mechanisms, and incorporates new atomic services into the online operation monitoring system in a non-intrusive manner.

3.3 Logical orchestration platform

The MMU team interfaces with many business scenarios, and different business scenarios have their own customized business logic. For example, the offline retrieval business requires customized fine-ranking strategies, and the entire process requires calling many external interfaces to complete the business logic. In addition, in terms of input and output, some systems are closely coupled with business data, and the data input and output forms are relatively diverse.

To avoid the need to write customized code in every scenario, the MMU team developed a logic orchestration engine that abstracts common steps of business logic, such as vector databases, model atomic services, middleware, error handling, input and output, etc.

The logical orchestration engine uses several yaml files to describe the entire business process:

kind: workflow
nodes:
    - name: mq_datasource
      node: { "ref": { "file": "datasource.yaml" } }
      nexts:
        - external_name: video_embedding
   - name: video_embedding
      nexts:
        - external_name: milvus_search
      value:
        mms_vid: $.mq_datasource.video_id
      node: {"ref": {"file": "video_emb.yaml"}}
   - name: milvus_search
     nexts:
       - external_name: save_milvus_result
errorHandle: retry

Figure 15: Business orchestration example

  1. Receive video data from MQ data source

  2. Embedding video data

  3. Perform a Milvus search

  4. Store the search structure

In order to further improve the efficiency of logic orchestration, the team simplified the orchestration engine into a platform and unified management of the orchestration logic of each business through UI drag and drop.

Figure 16: Logical orchestration platform

3.4 Business access platform

The main positioning of the business access platform is to provide business parties with the function of self-service access to MMU services, including the entire process of browsing, experience, access, operation, etc., thereby improving the business party's access efficiency and user experience.

Business parties can use the relevant capabilities on their own through the above process. After determining that the business needs are met, they can apply for formal access on the platform.

Figure 17: Capability experience

Figure 18: Self-service debugging

Figure 19: Access application

3.5. Platform summary

By standardizing and platforming the three key links of model online service research and development, business logic research and development, and business access, the efficiency of the team's entire process from capability research and development to business access has been greatly improved, and each role can also focus on their respective cores. work to improve the overall human efficiency of the entire large team.

Figure 20: Summary of platformization

In the future, the MMU team plans to continue to improve the existing platform while exploring some vertical scenarios.

04. Summary

Thanks to the entire Milvus vector database team, the stable vector retrieval capabilities and diverse functional features provided by it provide great convenience for the MMU team when building business systems in vector retrieval scenarios. Its reliable distributed expansion capabilities effectively support the growing Data size.

During the writing of this article, ChatGPT triggered the AIGC craze, and the vector database represented by Milvus is one of the very important infrastructures of AIGC. OpenAI's introduction to the ChatGPT Retrieval Plugin and NVIDIA's press conference clearly mentioned the Milvus vector database and its significance. We look forward to Milvus continuing to provide users with more diverse functions in the future, such as GPU support, resource isolation, etc. We also look forward to Milvus shining even brighter in the AI ​​era.


  • If you have any problems using Milvus or Zilliz products, you can add the assistant WeChat "zilliz-tech" to join the communication group.

  • Welcome to follow the WeChat public account "Zilliz" to learn the latest information.

JetBrains releases Rust IDE: RustRover Java 21 / JDK 21 (LTS) GA With so many Java developers in China, an ecological-level application development framework .NET 8 should be born. The performance is greatly improved, and it is far ahead of .NET 7. PostgreSQL 16 is released by a former member of the Rust team I deeply regret and asked to cancel my name. I completed the removal of Nue JS on the front end yesterday. The author said that I will create a new Web ecosystem. NetEase Fuxi responded to the death of an employee who was "threatened by HR due to BUG". Ren Zhengfei: We are about to enter the fourth industrial revolution, Apple Is Huawei's teacher Vercel's new product "v0": Generate UI interface code based on text
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4209276/blog/10112297