Tribute to Turing! HashData embraces the new era of data intelligence!

picture

Figure 1: The site of the 2023 ACM China Turing Conference

Alan Turing, born in 1912, is known as "the father of computer science" and "the father of artificial intelligence". In 1966, in memory of this outstanding scientist, the International Association for Computing Machinery (ACM) established the ACM Turing Award named after him to honor scientists who have made significant contributions in the computer field.

To this day, Turing's spirit of focusing on innovation and continuous exploration has inspired generations of scientists to devote themselves to computer science research, and the resulting IT technology revolution has also profoundly affected social and economic development.

Today, a new round of artificial intelligence technology revolution and industrial transformation has arrived. Since the end of last year, many domestic technology companies have successively launched large-scale model products and actively promoted the industrial application of large-scale models.

At the moment when AI large models are hot, the 2023ACM China Turing Conference with the theme of "General Intelligence, Human-Machine Symbiosis" was held in Wuhan from July 28th to 30th. As a leading company in China's cloud-native data warehouse, Kuke Data was invited to participate in this conference. At the meeting, Kuker Data demonstrated the next-generation advanced analysis and data science tool HashML independently developed based on HashData Cloud Data Warehouse, which facilitates the production, application and large-scale implementation of AI models.

Three-layer decoupling reduces data analysis costs

AI model training needs to rely on massive text, image and video data. How to manage and utilize data at low cost and efficiently is an important challenge for enterprises in the era of artificial intelligence. Through the innovative three-layer decoupling architecture design, HashData cloud data warehouse ensures data consistency and saves lower storage and call resources.

picture

Figure 2: HashData product architecture

HashData's integrated feature of lake and warehouse helps enterprises manage multi-modal training data in one stop. The product architecture perfectly adapts to the hybrid cloud, helping users to achieve security, compliance, unified management and scheduling of public domain training data and private training data to the greatest extent. The full elasticity and concurrency brought by the cloud architecture supports almost unlimited horizontal expansion to meet the peak demand of users for long-term and high-density data extraction and computing. In addition, HashData supports billion-level vector storage, providing fundamental underlying support for large model training.

At the same time, the HashData cloud data warehouse can give full play to the elasticity and scalability of the cloud platform, and persist data to the underlying object storage, greatly reducing the cost of data analysis for enterprises.

The HashData data warehouse can seamlessly connect to various public clouds and hybrid clouds, and provide capabilities including data warehouse, data lake, data science, data engineering, and data sharing on a unified platform, and can support tens of millions of database objects and 100+PB Data volume, thousands of concurrent applications.

In addition, by supporting object storage, HashData can provide data management capabilities that are fully compatible with multiple public clouds and hybrid clouds, providing a flexible and easy-to-use solution for the implementation of enterprise multi-cloud strategies.

Two engines efficiently manage massive data

In the data intelligence industry chain, infrastructure construction and model production and application are the only way for the development of AI large models, and machine learning is an important "grasp" for Al's growth.

Traditional MPP architecture databases have many problems in machine learning solutions, such as limited types of supported algorithms, lack of data parallel training, and difficulty in developing new algorithms, making it difficult to meet the needs of data management and model development.

In contrast, HashData has two computing engines: the MPP computing engine for SQL query analysis tasks, and the ML/DL computing engine for machine learning and deep learning tasks.

Based on the architecture of separation of storage and calculation, HashData can not only provide good support for traditional data warehouse business through SQL computing engine, but also realize efficient support for machine learning and deep learning with the help of ML/DL computing engine, which is included in the library The environment fine-tunes and infers on large language models. HashML is the next-generation In-Database advanced analysis and data science tool created by Kuker Data using the powerful computing engine of HashData.

 Figure 3: Building HashML based on the HashData dual computing engine architecture

At the same time, HashData also realizes the efficient storage and retrieval of large-scale vector data, making it easier to build LLM intelligent applications based on knowledge enhancement.

One step in place to facilitate the large-scale application of large models

After the development of the AI ​​model is completed, it needs to be deployed in the actual production environment in order to finally realize its value.

Research by Gartner, an international research organization, shows that only 53% of projects can be transformed from AI prototypes to production. The main reason for the low conversion rate of AI production is that there are problems in the life cycle management of the model, including the difficulty of cross-team collaboration, lack of process and asset management, and long production and delivery cycles.

In order to solve these problems, HashML provides a full set of tools from data query processing, advanced analysis to ML/DL model training, reasoning and service deployment, and realizes full-link support for fine-tuning and reasoning of large language models. For example, with the help of HashML, the parameters of the LLaMA2 model can be efficiently fine-tuned with at least 3 lines of code:

picture

Figure 4: Fine-tuning the LLaMA2 model with HashML

HashML and HashData cloud data warehouse share unified storage and computing resources, and provide out-of-the-box, one-stop delivery AI capabilities with the deployment of the data warehouse, which greatly reduces the cost and complexity of system deployment and provides developers with unified Data query, analysis, modeling environment.

picture

Figure 5: Overview of the main features of HashML

As a data science tool with advanced algorithms and excellent performance, HashML can help users build, train and deploy models efficiently and easily, greatly reducing the threshold for modeling and allowing users to try multiple model architectures and parameters in a short time combination to better meet the needs of multiple application scenarios.

Currently, various functions of HashML are being intensively improved, and the first official version is expected to be released in August. At that time, we will invite all partners to participate in the verification test, promote rapid product upgrades, accelerate the implementation of AI, and let AI benefit all walks of life.

Guess you like

Origin blog.csdn.net/m0_54979897/article/details/131998313