"Vector Database Guide" - Several key points that were considered from the beginning during the entire Milvus Cloud2.0 design process

First, we consider how to achieve scalability.

Secondly, we considered the important points of cloud native or focusing on databases, including elasticity and data isolation. Because in version 1.0, query performance will affect each other when users build indexes. Therefore, our goal is to effectively isolate index building and querying. The third consideration is the real-time nature of the data. In version 1.0, one of our users inserted a large amount of data due to weak consistency guarantee, but it could not be found when querying. We spent a lot of time troubleshooting, and finally found that the inserted data was not indexed in time and could not be served online, and users were completely unaware of this.

For this reason, we paid attention to real-time performance when building version 2.0 to ensure that users can use it in time after inserting data. The ideal situation is of course strong consistency, that is, it can be queried immediately after writing. However, in some scenarios, a certain delay can be accepted, but protection is essential. Therefore, we spent a lot of effort in version 2.0 to develop streaming data writing capabilities, how to combine streaming data with traditional batch-imported data, including how to update and delete data. These issues have become quite complex in our 2.0 version, especially under the cloud-native architecture. Everyone is more or less aware of the problems associated with data lakes. Early products did not have the ability to update and delete. These capabilities were introduced later. Therefore, the design of version 2.0 focuses heavily on how to solve real-time problems.

Around June 2021, we released the first

Guess you like

Origin blog.csdn.net/qinglingye/article/details/132832291