Volcano Engine cloud search service upgrades the new cloud-native architecture; provides billions of distributed vector database capabilities

From the beginning of the development of the Internet, search technology has bloomed amazing social and economic value. With the rapid development of the information society and the explosive growth of data, search technology meets the needs of information sharing and rapid retrieval through data collection and processing.
Cloud search service ESCloud is a fully managed online distributed search service provided by Volcano Engine , compatible with Elasticsearch, Kibana and other software and common open source plug-ins. It can provide multi-condition retrieval, statistics, and reports of structured and unstructured text, helping to realize one-click deployment, elastic scaling, simplified operation and maintenance, and rapid construction of practical services such as log analysis and information retrieval analysis.
With the rise of Serverless and the direction of the general trend, the volcano engine cloud search service upgrades the new cloud-native architecture .
 

Cloud Search Service Cloud Native Edition

 

k-NN, native vector search and database in the era of large models

With the emergence of applications in emerging fields such as recommendation, audio and video, and the demand for large model scenarios, it is imperative to introduce multimodal search to meet more complex search requirements. On the basis of full-text search, we add vector search capability to realize the analysis and retrieval of unstructured data .
In the scenario of vector search, the machine learning model is used to generate vectors to represent data objects (text, images, audio and video, etc.); the vector distance represents the similarity between objects. The commonly used vector library uses the ANN algorithm to complete the retrieval of massive vectors in a very short time.
k-NN can be used as a vector database. By introducing an advanced vector algorithm library to construct a vector index, it will also persist the constructed vector index to disk, making the index more stable. Combined with the inverted index of ESCloud products, the vector search and full-text search capabilities can be integrated to achieve a more powerful hybrid search (Hybrid Search) capability. Based on ESCloud's cluster, the k-NN vector database can provide large-scale distributed capabilities, bringing users scalable vector searches.

Scenario

The business scenarios based on k-NN mainly fall into the following six categories, which are currently used in complex business scenarios within ByteDance:
  • Multimodal search: including image search, semantic search, audio and video similarity search, etc.;
  • Intelligent recommendation: video recommendation, advertising recommendation, relationship recommendation, product recommendation, etc.;
  • Intelligent Q&A: Transformer-based FAQ, LLM domain knowledge Q&A, LangChain set of generative QA;
  • Data deduplication: review and deduplication of video, audio, and pictures, and copyright detection of various materials;
  • Security risk control: fraud detection, anti-crime detection, risk assessment, anomaly detection;
  • Other applications: data mining, data analysis, search reordering, text image search.
 
Take the text similarity recognition scheme as an example.
In the scenario where users push copywriting, in order to ensure user experience, it is necessary to ensure that there is no duplicate content in the pushed copywriting, so similarity identification and deduplication will be carried out for each pushed content. Each copy generates an Embedding through the BERT model and retrieves it once in the cloud search. If the similarity is lower than the threshold, it will be judged as a new copy, which will be written into the k-NN vector database and gradually perfected into a copy library; if the similarity is higher than the threshold, it will be judged as a duplicate copy, reducing the number of pushes.
 
 

The cloud search service ESCloud is compatible with Elasticsearch, Kibana and other software and commonly used open source plug-ins. It provides structured and unstructured text multi-condition retrieval, statistics, and reports. It can realize one-click deployment, elastic scaling, simplified operation and maintenance, and quickly build business capabilities such as log analysis and information retrieval analysis.
 
Learn more about product details: https://www.volcengine.com/product/elasticsearch-service
RustDesk 1.2: Using Flutter to rewrite the desktop version, supporting Wayland accused of deepin V23 successfully adapting to WSL 8 programming languages ​​​​with the most demand in 2023: PHP is strong, C/C++ demand slows down React is experiencing the moment of Angular.js? CentOS project claims to be "open to everyone" MySQL 8.1 and MySQL 8.0.34 are officially released Rust 1.71.0 stable version is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5941630/blog/10088218