Deep dive #2: Detailed API and Python SDKs

Author: Yang Xuan

img

Editor's note:

What is an API? What are Python SDKs? Friends familiar with the Milvus database should immediately think of Milvus protocol API and PyMilvus. This issue will help you see Milvus and the Milvus open source community from a fresh perspective.

Lecturer Profile:

Yang Xuan, pymilvus-admin and Milvus contributor, is responsible for maintaining the PyMilvus project and developing the DataNode module of Milvus. An avid fan of the open source community, he actively participates in various open source activities and is a core volunteer member of GDG Shanghai. Master of software engineering from Wuhan University, in addition to working as a software development engineer at Zilliz, he is also a vim primary school student who asked the department owner to abandon vscode to vim.

Share the outline:

  1. Background on PyMilvus and the Milvus protocol API
  2. How to use the protocol API
  3. PyMilvus code deep
  4. PyMilvus development goals and plans

Background introduction

Let's treat Milvus as a black box. The following picture represents the process of SDKs interacting with Milvus through gRPC. gRPC uses Protocol buffers as the interface definition language to describe the interface of the server and the information structure of the message they pass. Language files for protocol buffers usually end in proto. Therefore, all the behaviors of the black box of Milvus are actually defined in the protocol API.

img

The specific contents of the PyMilvus protocol API are:

  1. milvus.proto
  2. common.proto
  3. schema.proto

If the SDKs want to work properly, they all need to use these three proto files to interact with Milvus. The Milvus API shared this time is a preliminary interpretation of these three documents.

Interested partners can check the Milvus 2.0 SDK roadmap: https://milvus.io/docs/v2.0.0/roadmap.md#SDK

At the same time, PyMilvus 2.0 version introduces a new set of interface - object relational mapping (ORM), which is very different from the interface of the previous version. Today, we will also talk about the characteristics of the ORM interface and the next development path of PyMilvus.

Milvus protocol API

milvus.proto

The most important part of the Milvus protocol API is milvus.proto, which defines MilvusService, and MilvusService continues to define all the RPC interfaces of Milvus. Therefore, it is important to pay attention here when interacting with Milvus.

The RPC interface parameters in the figure below are CreatePartitionRequest, the two main parameters are collection_nameand partition_name. Based on these points, a collection partition can be created.

img

What does this protocol look like? Enter the PyMilvus repository ( https://github.com/milvus-io/pymilvus/blob/master/pymilvus/grpc_gen/proto/milvus.proto), you can find the example mentioned above on line 19:

img

We can find CreatePartitionRequestthe :

img

If you need to develop new functions or develop new SDKs, the PyMilvus repository can help you find the interfaces that Milvus provides externally through RPC.

common.proto

As the name suggests, the common.proto section is the type of common. The structure of this part includes the commonly used ErrorCodeand Statusetc .:

img

Schema.proto

When passing parameters, all required schemas are defined in this section. C``ollection``S``chemaAn example is as follows:

img

The above three protos are combined together to form the external API of Milvus, in which all external behaviors of Milvus can be found.

You can read the source code and observe create_index()that external create_indexinterfaces like this actually call multiple RPC interfaces such as . To sum up, many external interfaces in Milvus are actually functions composed of multiple RPC interfaces.describe_collectiondescribe_index

So, when you understand the behavior of these RPCs, you can create new functions through composition at the SDK level. You are welcome to use your creativity and imagination in this section to contribute to the Milvus community!

PyMilvus 2.0

Object-relational mapping(ORM)

The ORM is simply stated in one sentence: operations on a local object will affect an object at the service endpoint. The ORM interface of PyMilvus has the following three characteristics:

  1. Operate directly on objects.
  2. Isolate business logic and data access details. Isolate business logic and data access details.
  3. Hide the implementation complexity, same codes everywhere.

The third point is that for Milvus, whether it is a DBS, a local library, or a cloud service, no matter whether the intermediate implementation process is RPC or python, as long as the ORM-style code is used, the code can be applied without any changes. A state of the Milvus server. This is the benefit of ORM abstraction.

ORM style API

The biggest feature of the ORM-style API is that it can control the connection to Milvus. For example, you can add multiple Milvus server aliases and choose to use the aliases to connect to or shut down one of the services. You can also precisely control which objects use which machine and which connection by deleting the local server address.

img

The second feature is that all operations are directly object-oriented operations, including collection, partition and index. When the three objects are abstracted, all operations on them can be performed directly on the objects.

For example, if we want to extract a collection object, whether the collection is newly created or already exists in the Milvus server, we can create an object through the collection interface. You can also bind the connection to the object, and use the DEV alias to refer to the Milvus server. The films will have some state locally, and we can directly perform various operations on this object.

If you want to construct a partition object, there are two ways. There is only one provided here: we can create a partition through the collection object of films. After getting this adventure, the partition will also have some states of its own. We can do the same for partitions as we do for collections. There is also an object Index here. Similar to partition, we can also create an index through the collection object films for subsequent operations.

In addition to the methods of creating a new addition and a new index, if a partition or an index already exists in the collection of films, it can also be extracted through the collection object of films.

img

more help

It is recommended that you go through the PyMilvus documentation for a real and in-depth understanding of how to use it. The PyMilvus documentation consists of two parts, the first part is automatically generated in the API doc-strings, and the second part is written by PyMilvus contributors, based on the user's perspective, which is very practical.

You can view all our documentation here: https://milvus.io/docs

PyMilvus documentation source code viewing address: pymilvus/docs at master milvus-io/pymilvus


With a vision to redefine data science, Zilliz is committed to building a global leader in open source technology innovation and unlocking the hidden value of unstructured data for enterprises through open source and cloud-native solutions.

Zilliz built the Milvus vector database to accelerate the development of a next-generation data platform. The Milvus database is a graduate project of the LF AI & Data Foundation. It can manage a large number of unstructured data sets and has a wide range of applications in new drug discovery, recommendation systems, chatbots, etc.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324147600&siteId=291194637