Unlock cloud-native virtual data warehouse PieCloudDB Database "Phase 1"

PieCloudDB, the flagship product of Tuoshupai, adopts the leading data warehouse virtualization technology to build a cloud-native virtual data warehouse with high security, high reliability and high online "rock-solid" for enterprises. This series of articles will introduce the latest developments and new features of PieCloudDB Database.

Related video: link

Product trial: https://app.pieclouddb.com

With the abundance of computing resources and network resources, computing platforms have undergone three generations of platform changes from the mainframe era, PC era, to today's cloud era. In the third transformation, the breakthrough of server virtualization technology has led to the arrival of the era of cloud computing. 

Three generations of computing platform changes 

In order to make full use of the dividends brought by the cloud, Tuoshupai has created a new database management platform for the cloud era: PieCloudDB. PieCloudDB disassembles the three logical core components of user data, metadata and computing engine and reassembles them on the cloud. This storage and computing separation architecture brings high elasticity on the cloud, and has high fault tolerance and high online capabilities of software and hardware separation. Users can elastically expand storage or computing resources on demand according to their own needs.

Data warehouse cloud-native virtualization technology breakthrough leads the era of data computing 

Since October 24, 2022, Tuosupai has successively released the PieCloudDB community edition, enterprise edition, and all-in-one version. On πDay, March 14, Tuosupai released a new version of PieCloudDB: Cloud on Cloud. So far, PieCloudDB has fully supported the three deployment methods of bare hardware, private cloud, and public cloud. 

Various deployment methods of PieCloudDB 

In the new version, PieCloudDB fully realizes cloud virtualization of data warehouses . Cloud-native data warehouse virtualization breaks through many bottlenecks of traditional MPP databases, realizes a new eMPP architecture on the cloud, and enables concurrent execution of multiple cloud-native virtual data warehouses. In order to obtain many dividends provided by the new architecture on the cloud, including breaking data islands, second-level expansion and contraction, dynamic allocation of resources, on-demand payment, etc.

PieCloudDB implements the eMPP architecture on the cloud 

The new version implements many new functions, bringing performance and stability improvements in all aspects, making PieCloudDB truly "unbreakable" rock-solid, including: 

  • Aggregate pushdown functionality is enhanced 

In the analytical scenario of the database, there are often a large number of aggregation operations. The aggregation push-down function implemented by PieCloudDB can greatly reduce the amount of data that needs to be processed by the connection operation by pushing the aggregation operation to be executed before the connection operation, so that the query performance can be significantly improved. 

After testing, aggregation pushdown has improved PieCloudDB by nearly a hundred times or even a thousand times in some complex query application scenarios. 

Aggregate Pushdown Function 

  • Block File Skipping optimization

The user data of PieCloudDB is stored in object storage in a row-column mixed data format. At the same time, PieCloudDB uses block files as storage units. Block files are stored in columns to achieve efficient compression and save storage space; the Block File Skipping optimization mechanism implemented in the new version of PieCloudDB pre-calculates the column aggregation information in each block file when the database runs query statements, and executes During this period, unnecessary data blocks are skipped, thereby reducing the amount of data read and improving query performance. 

PieCloudDB row and column mixed storage 

  • Achieve extremely fast Analyze 

The "Analyze" operation analyzes the contents of a database table, gathering statistics about the distribution of values ​​in each column of each table. The database query engine uses these statistics to generate an optimal query plan. 

For most database systems, Analyze is often executed manually or automatically by AUTO VACUUM, which takes too long for queries on large tables with large amounts of data. 

In the new version, PieCloudDB implements extremely fast Analyze, which can automatically complete Analyze when data changes, and generate more accurate query planning statistics in a timely manner.

  • Brand new caching mechanism 

For metadata, PieCloudDB implements a new caching mechanism in the metadata layer, which effectively reduces the network communication overhead and the load of the metadata server caused by accessing the metadata server, and improves the speed of metadata access.  

  • Support fast ETL/ELT, and query of external data sources 

In the new version, PieCloudDB natively supports Kafka streaming data import. The copy operation is optimized from the original single node to the entire cluster, and the performance is greatly improved, which is proportional to the size of the cluster. 

In addition, in the new version, PieCloudDB supports the foreign-data wrapper module, allowing users to access data sources including but not limited to HDFS, MySQL, etc. At the same time, PieCloudDB supports users to develop modules to access new storage data sources. 

In addition to these five major optimizations, the kernel of the new version of PieCloudDB also implements 

  • Enhanced Observability
  • Vacuum optimization
  • Support native storage format on HDFS/NAS system
  • Support for the open source optimizer Orca
  • Support for the open source machine learning library Madlib
  • Support for very large data volume fields

…. 

And many other optimizations.

PieCloudDB created a new storage engine, JANM. Jianmo comes from "Bamboo Slips and Ink Book", which vividly describes the storage form of PieCloudDB's row-column mixed storage. 

In the new version of PieCloudDB, the storage engine JANM implements: 

  • Enhancements to JANM Distributed Processing 
  • JANM dynamically allocates and reads files to enhance dispatch performance 
  • Optimization of JANM exception handling 

…. 

and many other functions.

The PieCloudDB cloud-native management and control platform has been completed including: 

  • User permission optimization 
  • Registration options added 
  • Data insight optimization 
  • Data import optimization 
  • External access supports more types 

…. 

And many other optimizations.

PieCloudDB will continue to iterate and move forward. You are welcome to go to https://app.pieclouddb.com to try the cloud-on-cloud version. We also look forward to scanning the QR code to join our technical community and work together with us!

Guess you like

Origin blog.csdn.net/OpenPie/article/details/130286454