Jian Mo’s evolution: creating a cloud storage base for large model data computing systems

Jian Mo’s evolution: creating a cloud storage base for large model data computing systems

On Programmer's Day on October 24, the "Large Model Data Computing System" 2023 Tuoshupai Annual Technology Forum concluded successfully in Shanghai. Tuoshupai's large model data computing system (PieData Computing System, abbreviation: πDataCS) arrived as scheduled! πDataCS uses cloud-native technology to reconstruct data storage and computing, with one storage and multi-engine data computing, making AI models bigger and faster, and comprehensively upgrading the big data system to the big model era. As the cloud storage base of πDataCS, the goal of Jianmo Storage System is to create a data management and storage base that meets high-performance computing systems in various cloud scenarios.

1 πDataCS: One data storage, multi-engine data calculation

πDataCS aims to help enterprises optimize computing bottlenecks, fully utilize and leverage the advantages of data scale, build core technical barriers, and better empower business development, so that autonomous and controllable large model data computing systems can maintain global leadership and fully empower large model technology. Ability to work in all walks of life.

Computing platforms have undergone three generations of major changes from mainframes and PCs to today's cloud platforms. The cloud platform represents currently the largest computing power, storage capacity and horizontal expansion capabilities. In the PC era, metadata and user data were mapped on the local hard disk, calculations were mapped on the local CPU, and storage and calculation were tightly coupled on the same server.

πDataCS uses cloud-native technology to reconstruct data storage and computing. It first separates computing and data in the data computing system to enhance the flexibility of the system. Then, considering future data governance and transactions, Tuoshupai separated metadata and user data again and implemented a new eMPP architecture . Metadata is mapped to block storage and managed by the metadata management system "Mu" ; user data is mapped to object storage and managed by the "Jianmo" storage system ; computing is mapped to containers or virtual machines and managed by the computing system to manage.

πDataCS upgrades data governance and realizes data value through Data Mesh . πDataCS deeply considers the requirements of global data transactions and data governance. As a new production factor, data is an important fuel for model development. Under the premise of privacy and security, the data owner can share the metadata containing the data directory with other users. The data operator uses the metadata to access the owner's user data, and as needed, pays access to the owner through authorization. user data. When data operators access the owner's data, they need to call the data calculation engine provided by the data processor.

The overall architecture of πDataCS is divided into four layers, as shown in the figure below:

Data computing system πDataCS architecture

The top layer is the calculation engine supported by πDataCS. Currently πDataCS supports the following calculation engines:

  • PieCloudDB: As Tuoshupai’s first cloud-native data warehouse computing engine, it supports SQL language model and is compatible with HTAP
  • PieCloudVector: A cloud-native vector computing engine built to support vector computing with large models
  • PieCloudML: A cloud-native machine learning engine built to support machine learning languages ​​like Python and R

1.1 PieCloudDB: the first cloud-native data warehouse computing engine

As the first computing engine of πDataCS, PieCloudDB cloud-native virtual data warehouse fully supports multiple product versions of πDataCS public cloud version, community version, enterprise version and all-in-one machine, and provides three deployment methods of public cloud, private cloud and bare hardware . Warehouse virtualization technology helps enterprises break down data silos, integrate all structured data resources, and easily handle strong logic calculations .

Cloud Native's storage and computing separation architecture uses a three-layer architecture of metadata-computing-data separation to achieve independent management of storage resources and computing resources on the cloud. On the cloud, PieCloudDB uses eMPP (elastic Massive Parallel Processing) patented technology to implement concurrent execution of tasks in multiple clusters. Enterprises can flexibly expand and contract capacity, achieve efficient scaling as load changes, and easily handle petabyte-level massive data.

1.2 PieCloudVector: cloud native vector computing engine

A vector database is a database system specifically designed to store, query, and analyze vector data (such as feature vectors).

After comparing the implementation and performance of pgvector and pgembedding, we did not use the open source implementation, but completely independently developed PieCloudVector to meet the usage scenarios of our users. PieCloudVector has functions such as efficient storage and retrieval of vector data, similarity search, vector indexing, vector clustering and classification, high-performance parallel computing, strong scalability and fault tolerance.

Cloud native vector computing engine: PieCloudVector

1.3 PieCloudML: Cloud-native machine learning engine

However, with the increasing development of artificial intelligence, more and more economic activities will be driven by AI in the future. The cloud-native machine learning engine PieCloudML is established in πDataCS. Through the various ML, graph and large model algorithms built into PieCloudML , data scientists can use familiar methods such as python/R to complete various tasks and use the data computing system to generate all tasks. required model.

Cloud native machine learning engine: PieCloudML

In order to accelerate the performance of big data processing and computing, πDataCS fully relies on new hardware for asynchronous computing, such as GPU, FPGA, etc. And through the unified metadata management layer - "Wu Shou", these three major computing engines share a data storage base - "Jian Mo" to achieve one data and multiple engine calculations.

Next, we will introduce in detail Jianmo, the cloud storage base of this big data computing system.

2 Jian Mo: Cloud storage base for big data computing systems

As the cloud storage base of πDataCS, the goal of Jianmo Storage System is to create a data management and storage base that meets high-performance computing systems in various cloud scenarios. Based on modern hardware and facilities, Jianmo will make full use of the potential of the cloud, guarantee absolute data security, and strive to simplify the entire process of data loading, reading and calculation in big data processing, and provide adaptive management of data. , ACID transaction support and other functions to ensure absolute data security and achieve ultimate performance optimization to complete data calculation and analysis tasks in various scenarios.

In order to achieve this goal, the evolution of Jianmo mainly goes through three major stages:

  • Phase 1: New generation of cloud-native storage
  • Phase 2: Cloud storage base for big data computing system
  • Stage 3: Unified big data computing system storage engine

2.1 Evolution Stage 1: New Generation of Cloud Native Storage

The first phase of Jianmo is mainly used as the cloud native storage of the cloud native virtual data warehouse PieCloudDB , and the research and development work has been completed.

Jianmo is compatible with public clouds, private clouds and hybrid clouds based on different cloud environments. It uses object storage as the persistent storage layer and fully considers the data distribution and elasticity under the eMPP (elastic MPP) architecture. It uses consistent hashing (hash) ) to ensure that each node in the distributed environment accesses roughly the same data, and even if the capacity is expanded or reduced, the number of caches implemented can be reduced as much as possible. Jianmo has fully considered the security of the data, and combined with the transparent encryption in the cloud-native virtual data warehouse PieCloudDB, the data is encrypted when the storage is placed on the disk. Transparent encryption uses three levels of keys to ensure absolute data security. In addition, Jianmo has also made a lot of optimizations for reading and writing performance, which greatly improves the efficiency of data loading and querying.

2.1.1 New file format: janm

"Jianmo"'s new generation of cloud-native storage is built around the janm file format. The janm file format uses a mixed row and column storage design. Row-column hybrid storage allows the system to combine the efficient performance of row storage with the high compression ratio and cache line friendliness of column storage when reorganizing data. At the same time, the jamn file format can also support vectorization (SIMD) calculations and parallel computing. When designing, Jianmo also fully considered the storage expression method of internal and external memory, and redefined the table data format of data on disk and memory, so that there is no additional overhead for data conversion between data in the table on disk and in memory.

Within the file format, Jianmo will also collect data statistics within the file to speed up queries and support performance optimization features such as precomputation. In order to speed up I/O, the janm file format has multiple built-in compression algorithms, such as zstd, lz, etc. For different types, Jianmo can adaptively select different encoding methods, including delta encoding, dictionary encoding, etc.

Through block file level MVCC, Jianmo has complete transaction support. Whether the data in each file block is visible or not, Jianmo will judge based on the current transaction isolation level through the MVCC information of the file where it is located. In PieCloudDB, Jianmo has deeply customized the access layer to ensure that PieCloudDB makes full use of various optimizations provided in Jianmo.

At present, Jianmo has made a lot of optimizations for data reading and querying, and has implemented many functions including data skipping, pre-computing accelerated aggregation query, Smart Analyze, TOAST support, etc.:

  • Data reduction: When querying, reduce the amount of data to be read as much as possible according to the query conditions, so as to save I/O and improve query performance.

  • Precomputation: For aggregation queries, when Jianmo collects the aggregation data of each data block, it can speed up the data aggregation calculation by using the data from the database.

  • Smart Analyze: Generally speaking, the query optimizer generates a query execution plan by analyzing the data distribution information of the table collected by analyzing the entire table. For analysis scenarios, when the amount of data is too large, the table data distribution information collected through ordinary analyze will have large errors, resulting in poor execution plans. Smart Analyze calculates the distribution information of each data block when loading data, and then merges the statistical information of all data blocks through the merge algorithm to generate more accurate table data distribution information. The basic idea is to generate as much data as possible without affecting performance. to sample user data.

  • Support for very large field storage: Jianmo's support for very large field storage has already enabled basic read and write operations. In the new version, PieCloudDB JAMN has been further optimized to fully support the UPDATE/DELETE and VACUUM functions of very large field storage.

...

With the completion of this stage, combined with the needs of πDataCS, the R&D team carried out the second stage of design and implementation of Jianmo, with the goal of turning Jianmo into the cloud storage base of the big data computing system.

2.2 Evolution Stage 2: Cloud Storage Base of Big Data Computing System

At this stage, Jianmo will serve as the cloud storage base of πDataCS. The goal is to truly achieve "one data, multiple engine calculations" , and the corresponding research and development work is in progress.

To achieve this goal, Jianmo plans to implement the following features:

  • More file formats supported
  • Data interoperability
  • More efficient external data extraction and loading
  • Streaming data processing
  • High-performance ACID transaction processing
  • Adaptive data management
  • Support for CDC scenarios
  • More cloud-native Index support

...

The figure below details all the levels of the JANM Table Format. Each level depends on the level below it and draws the required capabilities from it. The user stores the data in the corresponding file format. Scalable cloud storage provides data for upper-layer calculations.

Jian Mo: Cloud storage base for big data computing systems

2.2.1 Storage Access Abstract layer

The lowest level is Jianmo's storage access abstraction layer. Jianmo uses abstract APIs to interact with any type of storage, including cloud object storage (such as S3), HDFS, etc. In this way, Jianmo ensures compatibility with all storage engines. In addition, Jianmo packages the file system to further optimize storage functions, such as providing monitoring and various read and write strategies.

2.2.2 Data file format abstraction (File Format Abstract) layer

Jianmohui supports multiple file formats at this layer and has a unified access interface to simplify data access operations, so that users can freely choose different file formats to store user data. At the same time, at a higher level, Jianmo's unique file layout scheme involves recording all changes to each file, which enables Jianmo to create an independent redo log that can be used to implement more rich functions.

2.2.3 Core layer of Table Format

The core layer in tabular format provides functional encapsulation and implementation of various features. The core layer includes the following 5 subsystems:

tabular core layer

  • Table Transaction Engine

The core layer contains a transaction engine with tables, implements file-level MVCC, supports database visibility judgment based on isolation levels, and ensures certain concurrency control. Regarding transaction guarantees, Jianmo's basic idea is that logs are data, and the data refers to transaction visibility information.

  • Index

Indexes help the database plan better queries, reduce overall I/O and provide faster response times. Index information about file lists and columns is sufficient for the OLAP engine to quickly generate efficient query plans in OLAP (analytical) scenarios. Currently, janm supports the indexes required for data skipping. In the future, we will continue to explore more index implementations, even row-level indexes.

  • Adaptive management of table data (Table Management)

The adaptive management functions of table data supported by Jianmo mainly include:

➢ VACCUM: data cleaning, garbage space left by recycling operations

➢ Smart Analyze: Data distribution information sampling

➢ Compaction: merge small files to improve I/O efficiency

➢ Cluster: Cluster similar data into the same file as much as possible to improve the efficiency of data reduction and improve query speed.

➢ Sort: Sort data according to specified fields or conditions

...

  • Encapsulation of related operations and controls in tabular format

At this layer, Jianmohui supports the control of the composition and layout of tables, the traversal of table files, and the encapsulation of table data size statistics. In object storage, listing files is a very expensive operation. Jianmo uses the functions provided by the table format layer to perform fast file traversal and data size statistics.

2.2.4 Extensible programming interface

For the upper-layer interface, Jianmo provides a unified API to interact with external services to facilitate the access of third-party applications. Jianmo supports different implementations of extended services without the need for additional application development, saving users costs and workload. Provides an entrance to data access, provides table access services, snapshot-based operations, and rich functions including Time Travel.

Unified API layer

For table application services, Jianmo provides stateless data management applications that can be registered to any service to achieve adaptive data management.

After the completion of the second phase, Tuoshupai "Jianmo" plans to embrace open source to achieve true interoperability of data between different services , fully support many services including Spark and Clickhouse, and realize one data and multiple engine calculations.

2.3 Evolution Stage Three: Unified Big Data Computing System Storage Engine (Outlook)

In the future, in the third stage of evolution, Jianmo hopes to build a unified big data computing system storage engine . Create a unified access protocol and unify table formats, data lakes, table engines, etc. under the protocol to simplify user access operations. I hope everyone can continue to pay attention to Jian Mo’s progress!

Build a unified data access protocol

Tang Xiaoou, founder of SenseTime, passed away at the age of 55. In 2023, PHP stagnated . Hongmeng system is about to become independent, and many universities have set up "Hongmeng classes". The PC version of Quark Browser has started internal testing. ByteDance was "banned" by OpenAI. Zhihuijun's startup company refinanced, with an amount of over 600 million yuan, and a pre-money valuation of 3.5 billion yuan. AI code assistants are so popular that they can't even compete in the programming language rankings . Mate 60 Pro's 5G modem and radio frequency technology are far ahead No Star, No Fix MariaDB spins off SkySQL and forms as independent company
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5944765/blog/10150423