Milvus is back with a new release! Supports Upsert, Kafka Connector, and integrates Airbyte to facilitate efficient data stream processing


Milvus already supports Upsert, Kafka Connector, and Airbyte!


In last week's article " Login to Azure, release new version... What happened to Zilliz last night and this morning?" 》, we have already revealed that Milvus (Zilliz Cloud) has successively supported Upsert, Kafka Connector, and Airbyte in order to improve the efficiency of data stream processing. The functions of these functions are to simplify the data processing and integration process and provide developers with more efficient tools. To manage complex data, today we will introduce it to you one by one.


01.

Upsert: Simplify the data update process


Before the Upsert function was launched, updating data in Milvus required two steps: deleting data and then inserting new data. Although this method is also feasible, it cannot ensure data atomicity and the operation is too cumbersome. Milvus version 2.3 releases a new Upsert function. (The overseas version of Zilliz Cloud has also launched the Upsert function Beta version).


It can be said that the Upsert function redefines the way data is updated and managed. When using Upsert, Milvus will determine whether the data already exists. Insert the data if it does not exist, update the data if it already exists. This atomic approach is particularly important in systems like Milvus where inserting and deleting data are managed separately.


The specific order of Upsert is: insert data first, then delete duplicate data. This ensures that the data remains visible during the operation.


In addition, the Upsert function also specifically considers the scenario of modifying the primary key. Primary key columns cannot be changed during data updates. This is consistent with Milvus's principle of managing data across shards based on primary key hashes. This restriction avoids the complexity and potential data inconsistencies caused by cross-shard operations.


Upsert is simple to use, similar to the insert operation. Users can easily integrate Upsert into existing workflows without making major changes to the original process. In SDKs such as Pymilvus, the Upsert command call is exactly the same as the insert command. Users familiar with Milvus will have no difficulty using it and can expect a consistent and silky user experience.



When executing a command, Upsert will provide feedback on the success of the operation and the affected data, further increasing developer convenience. This easy-to-use and stable feature facilitates data management. See the Upsert documentation for more details.


However, you need to consider the following two points when using the Upsert function:


  • AutoID limitation : The prerequisite for using the Upsert function is to set AutoID to false. If AutoID is set to true in the Collection Schema, the Upsert operation cannot be performed. The main consideration for setting this limit is that Upsert also contains data update operations, and the updated data requires new primary key values. If the primary key value provided by the user conflicts with the primary key value automatically generated by AutoID, it may cause the data to be overwritten. Therefore, the Upsert function cannot be used for Collections that have AutoID turned on. We may remove this restriction in subsequent new versions.


  • Performance overhead: Upsert may incur a performance cost. Milvus uses the WAL architecture, and excessive deletion operations may cause performance degradation. The deletion operation in Milvus does not clear the data immediately, but marks the data for deletion. The data is actually cleared based on these tags later during the data compression process. Therefore, frequent deletion operations may cause data expansion and affect performance. We recommend not using the Upsert feature too frequently to ensure optimal performance.


02.

Kafka Connector: Empowering real-time data processing


Recently, Milvus and Zilliz Cloud have been connected to the Kafka Sink Connector. Vector data can be seamlessly and smoothly imported into the Milvus or Zilliz Cloud vector database in real time through Confluent/Kafka. This integration can further unleash the potential of vector databases and facilitate real-time generative AI applications, especially scenarios using large models such as OpenAI GPT-4.


Today, unstructured data accounts for more than 80% of the information we obtain, and this type of data is still growing explosively. Zilliz's partnership with Confluent marks a major advancement in unstructured data management and analysis, enabling us to more efficiently store and process real-time vector data streams and transform them into easily searchable data.


Common use cases for Kafka Connector + Milvus / Zilliz Cloud include:


  • Enhanced generative AI : Provides the latest vector data for GenAI applications to ensure accuracy and timeliness of generation. These two points are particularly important in fields such as finance and media, which require real-time processing of streaming data from various sources.


  • Optimize the e-commerce recommendation system: E-commerce platforms need to dynamically adjust their recommended products or content based on inventory and customer behavior in real time to improve user experience.


The steps to use Kafka Connector in Zilliz Cloud are also very simple:


  • Download the Kafka Sink Connector from GitHub or Confluent Hub.

  • Configure Confluent and Zilliz Cloud accounts.

  • Read the guide available in the GitHub repository and configure the Kafka Connector.

  • Run Kafka Connector to import real-time streaming data to Zilliz Cloud.


For a more in-depth look at setting up the Kafka Connector and related use cases, head to the GitHub repository or visit this page.


03.

Integrated Airbyte: more efficient data processing


Recently, Milvus collaborated with the Airbyte team to integrate Airbyte in Milvus, enhancing the data acquisition and usage processes in large language models (LLM) and vector databases. This integration can enhance developers’ ability to store, index, and search high-dimensional vector data, and greatly simplify the application building process such as generative chatbots and product recommendations.


Key highlights of this integration include:


  • Data transmission is more efficient : Airbyte can seamlessly transmit data from various sources to Milvus or Zilliz Cloud, instantly converting data into Embedding vectors, simplifying the data processing process.


  • More powerful search capabilities: This integration enhances the semantic search capabilities of the vector database. Based on Embedding vectors, the system can automatically identify and search for relevant content with high semantic similarity, enabling applications that require efficient retrieval of unstructured data.


  • Easier setup: Setting up a Milvus cluster and configuring Airbyte to sync data is simple. If you need to use Streamlit and OpenAI Embedding API to build applications, the same setup steps are required.


This integration simplifies data transfer and processing, unlocking unlimited possibilities for real-time AI applications. For example, in the customer support system, using Milvus or Zilliz Cloud to integrate Airbyte can create an intelligent technical support ticket system based on semantic search, thereby providing users with immediate and useful information, reducing manual intervention, and improving user experience.


Zilliz 始终致力于提升非结构化数据管理和处理能力和技术,本次推出的 Upsert、Kafka Connector、Airbyte 等工具的集成都展现了这一点。后续,我们将进一步优化数据获取和数据 Pipeline 功能,敬请期待!


推荐阅读



本文分享自微信公众号 - ZILLIZ(Zilliztech)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

商汤科技创始人汤晓鸥离世,享年 55 岁 2023 年,PHP 停滞不前 鸿蒙系统即将走向独立,多家高校设立“鸿蒙班” 夸克浏览器 PC 版开启内测 字节跳动被 OpenAI “封号”事件始末 稚晖君创业公司再融资,金额超 6 亿元,投前估值 35 亿元 AI 代码助手盛行,编程语言排行榜都没法做了 Mate 60 Pro 的 5G 调制解调器和射频技术遥遥领先 No Star, No Fix MariaDB 拆分 SkySQL,作为独立公司成立
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4209276/blog/10315996