Vector database Milvus Cloud Partition Key: three solutions for a large number of tenants and less data for a single tenant

three solutions

When this question was raised, the latest version of Milvus was 2.2.8. We performed a role swap. At that time, from the perspective of this user, there were several choices left in front of us:

  • Create a collection for each tenant

  • Create a partition for each tenant

  • Create a scalar field for the tenant name

Next, we analyze the feasibility of these three options in turn:

  • Option 1: Create a collection for each tenant.

This is the most natural way we think of. It is very intuitive and the easiest to use, but it has a fatal flaw. A Milvus cluster can only create a maximum of 65536 collections. The reason for this limitation is that the collections in Milvus are bound to the topics of the message system (Pulsar/Kafka). There is an upper limit on the number of topics in Pulsar/Kafka. When the number of collections is too large, the reuse rate of topics will also be very high. High, can cause severe read amplification problems. Since we have 10K - 20K tenants, one collection per tenant doesn't work.

But the good news is that the community is already planning to introduce some lighter message systems (NATS), and the number of collections is expected to reach a higher level in the future. If the problem of the number of collections can be solved, it can reach the upper limit of the number of tables in the hundreds of millions like MySQL

Guess you like

Origin blog.csdn.net/qinglingye/article/details/132263066