Process more data and significantly reduce costs! Milvus MMap Apocalypse

As the fastest open source vector database in VectorDBBench , Milvus can serve users with high performance needs well. At the same time, we have also noticed that some users use Milvus for offline services, and some users are not sensitive to performance requirements, which means that on instances of the same specification, they need to process more data at a lower cost. Lots of data.

Therefore, in Milvus 2.3, Milvus added the MMap function. After turning on MMap, it can ensure that instances of the same specifications can handle a larger amount of data, and at the same time, the memory size requirements will be transferred to the disk, thus significantly reducing costs.

In Milvus 2.3, you can start the MMap function by modifying milvus.yaml: queryNodeadd a new mmapDirPathitem under the configuration item and set its value to any legal path:

Next, let’s take a deeper look at MMap.

01.What is MMap?

MMap (Memory-mapped files) is a technology that implements mapping between files and memory in the operating system. Through MMap, we can directly map the contents of a file into the address space of the process, so that the contents of the file can be stored in the memory. Treated as a contiguous area of ​​memory without explicit file read or write operations. MMap provides an efficient and convenient way to access files, which is especially useful when dealing with large files or when random access to file contents is required.

A simple C language example is as follows:

void* map = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, offset)

When subsequent data is read through the map pointer, the contents of the file corresponding to fd will be read directly. If the read area is not in the memory, the operating system will cache the corresponding page and adjacent pages in the page cache, and the infrequently accessed pages may be swapped out.

After MMap is enabled in Milvus, the data will not be loaded directly into the memory. When a query occurs, the data will be dynamically loaded from the disk into the memory, and the system will also dynamically eliminate infrequently used data. Since the data in the Milvus query cluster are all immutable, when the data is eliminated from the memory, no disk writing will occur.

02.Performance, cost and system upper limit

Due to the need to store vector data, vector databases have higher memory capacity requirements. Users who want to process more data with limited memory and are not very sensitive to performance can use the MMap function. The system evicts some data from memory based on load and usage, allowing more data to be processed with the same memory capacity.

Seek the balance between space and time

There is no free lunch in the world, and the price of MMap is performance. According to our tests, when the memory is sufficient, the data is all in the memory after warm up, and the system performance will not be significantly degraded at this time. As the amount of data continues to increase, performance will gradually decrease. Therefore, we recommend that only users who are not performance-sensitive use the MMap function.

As we all know, the access pattern of data can greatly affect performance. Milvus's MMap function also tries its best to consider the impact of locality on performance. For the data part, it is usually accessed during filtering and reading, and is accessed sequentially, so the scalar data will be written directly to the disk in sequence. For variable-length types, we have made more optimizations, as shown in the figure below, the three strings are:

  • Vector

  • Database

  • The kite

Variable-length types will be flattened and written to a continuous area, and we will maintain an offsets array in memory to index the data. This ensures locality of data access and also eliminates the overhead of storing each variable-length data separately.

For vector indexing, more detailed optimization is required. Taking the most commonly used HNSW as an example, HNSW can be divided into two parts:

  • An adjacency list that stores connections between points in a graph

  • raw vector data

Since the vector itself is relatively large, usually hundreds or thousands of consecutive float32s, accessing a single vector itself can take advantage of locality. The access pattern of the adjacency list is relatively random during the query process. Vector data is usually much larger than the adjacency list, so we chose to only do MMap on the vector data, while the adjacency list is kept in memory, ensuring that performance will not drop too much while saving a lot of memory.

Zero Copy

In order for MMap to increase the upper limit of the amount of data processed by the system, we first need to ensure that the peak memory usage during the entire data loading process must be much lower than the actual amount of data. In previous versions of Milvus, QueryNode would read all the data when loading data, and the data would be copied during the entire process. During the development of the MMap function, we changed this process to a streaming one and removed many unnecessary copies, significantly reducing the memory overhead during the data loading process.

After these optimizations, MMap can truly improve the upper limit of the system's capabilities. After testing, after turning on MMap in Milvus 2.3, Milvus can process about 2 times the amount of data.

Currently, the MMap function is still in Beta status. In the future, we will make more optimizations on the memory usage of the entire system to support a larger amount of data on a single node. At the same time, more iterations will be made in the usage method, supporting more fine-grained control, dynamically changing the collection, and even the loading mode of the field.


  • If you have any problems using Milvus or Zilliz products, you can add the assistant WeChat "zilliz-tech" to join the communication group.

  • Welcome to follow the WeChat public account "Zilliz" to learn the latest information.

Bun releases official version 1.0, a magical bug in Windows File Explorer when JavaScript is runtime written by Zig , improves performance in one second JetBrains releases Rust IDE: RustRover PHP latest statistics: market share exceeds 70%, the king of CMS transplants Python programs To Mojo, the performance is increased by 250 times and the speed is faster than C. The performance of .NET 8 is greatly improved, far ahead of .NET 7. Comparison of the three major runtimes of JS: Deno, Bun and Node.js Visual Studio Code 1.82 NetEase Fuxi responded to employees "due to BUG was threatened by HR and passed away. The Unity engine will charge based on the number of game installations (runtime fee) starting next year.
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4209276/blog/10110698