Take you to read the third issue of the paper: Chen Qi, a researcher at Microsoft and a Ph.D. from Peking University, won the NeurIPS Outstanding Paper Award

 Datawhale dry goods 

Source: WhalePaper, person in charge: Fu Qu

Introduction to WhalePaper

Initiated by the members of the Datawhale team, it will share mature topics and open source solutions in current academic papers, and help everyone to learn "efficiently + comprehensively + self-discipline" better by reading and sharing papers together, so that everyone can gain something and boost! The direction includes the interpretation and sharing of papers in natural language processing (NLP), computer vision (CV), recommendation (Res) and other related directions, and more directions will be incorporated in the future.

Open source address: https://datawhalechina.github.io/whale-paper

WhalePaper | Github

Current activities

460087d2efffe67513f21d88aefc4e41.jpeg

Sharing time: July 29, 2023 (this Saturday) at 20:00

Sharing Direction: Vector Retrieval

Sharing tool: #腾讯会议: 815 -856-759

Paper agenda: 45 minutes for sharing, unlimited time for questions.

Sharing Outline:

  1. Introduction and Latest Development of Vector Retrieval Algorithms

  2. Algorithm and System Design of Vector Database

Guest & paper overview


3073fb594dfafae61468141f5400e458.png

Guest profile: Chen Qi, chief researcher of the System Research Group of Microsoft Research Asia. She received her BS and PhD degrees in computer science from Peking University in 2010 and 2016, where she conducted research on distributed systems, cloud computing, and parallel computing with her supervisor Prof. Zhen Xiao. From 2013 to 2014, she was a visiting student in the System Group of New York University, under the guidance of Professor Li Jinyang, engaged in the research of distributed array framework. She has published more than 20 papers in top conferences and journals, some of which have won important awards, such as OSDI Best Paper Award and NeurIPS Outstanding Paper Award. Her current research interests include distributed systems, cloud computing, and deep learning algorithms and frameworks.

Topic: Vector Search and Vector Database

Topic introduction: The latest advances in deep learning in recent years have enabled various types of data to be mapped into high-dimensional vectors. The current state-of-the-art vector search libraries mainly focus on how to perform fast and high-recall searches in memory. However, there are some challenges in extremely large-scale vector search scenarios. For example, tens of billions of vectors combined with limited memory can cause capacity issues. At the same time, scalability is also a problem. Increasing the number of server machines will increase query latency and computing costs. Furthermore, high-dimensional vector indexes do not possess monotonicity, which is a key property of traditional indexes. The lack of monotonicity makes existing vector systems have to rely on temporary indexes that maintain monotonicity, TopK nearest neighbors for target vectors, in order to achieve complex queries of approximate similarity searches and relational operations. This leads to a decrease in performance because it is difficult to predict the optimal K value.

In this talk, we introduce SPANN, a distributed disk-based ANNS system, which has been integrated into Bing, and can realize tens of billions of vector searches with millisecond-level response time. Additionally, we introduce VBASE, a vector database system that efficiently handles complex queries based on a common property called relaxed monotonicity. This approach unifies two seemingly incompatible systems, delivering three orders of magnitude better performance than existing state-of-the-art vector systems.

way of participation

Scan the QR code to join the WhalePaper group

9fa9f295e8f547bd4726d7b8aaa38666.jpeg

If the group is full, please reply "paper" in the background of the official account

Contact information of the person in charge of WhalePaper:

Fuqu (WeChat ID: MePhyllis)

Hua Hui (WeChat ID: BuShouY)

15619b6885e0d89a1b46656f2363d22c.png

Supongo que te gusta

Origin blog.csdn.net/Datawhale/article/details/131989808
Recomendado
Clasificación