Alibaba Cloud Alex Chen: Data is like a vast universe

Every day, we encounter unanswered questions such as: How many planets are there in the universe?

There is no final answer to this question, after all, there are 6 billion Earth-like planets in our own Milky Way galaxy alone.

According to Alex Chen, senior product director of Alibaba Cloud Intelligence and head of Alibaba Cloud storage products, the data generated in human production and life today is like a vast universe: the amount of data is large and expanding; there are many unforeseen risks hidden ; also bound by gravity. And every enterprise is like a spaceship. When flying in the vast universe, it needs effective measures to deal with the rapidly expanding data, shield various risks and resist various disasters, and get rid of the gravitational constraints of data to realize the universe. Speed ​​and discover hidden value in data.

1. Hierarchical hot and cold data, exerting economies of scale, and helping customers reduce costs

This is an era of explosive data growth. According to a report released by IDC, the scale of China's data volume will increase from 23.88ZB in 2022 to 76.6ZB in 2027, with an average annual growth rate of CAGR of 26.3%. If such a large amount of data is classified, it can be divided into online data that requires real-time access and offline data that does not require real-time access (such as archive-type tape libraries).

At present, Alibaba Cloud Object Storage OSS has stored dozens of exabytes of data. The network disk and photo album service PDS provides data storage services for products such as Quark, UC, Alibaba Cloud Disk, and China Mobile Cloud Disk, serving 800 million end users in total. In order to better help customers manage these data, object storage OSS provides five storage types. Online data can be placed in the standard/low frequency/archive type of OSS, and offline data can be placed in the cold archive/deep cold archive type of OSS.

Previously, the data in the OSS archive type needs to be unfrozen when reading, but the release of the archive direct reading capability makes it possible to omit the unfreezing step and directly access the data. When data lifecycle management is required, lifecycle rules can be created based on the policy of the last modified time (Last Modified Time) and the last access time (Last Access Time), and multiple files in the storage space (Bucket) can be periodically (Object) is dumped to the specified storage type, thereby saving storage costs. In addition, the OSS archive type also adds intra-city redundancy specifications, which further improves data reliability.

Recently, Alibaba Cloud Storage released the OSS deep cold archive type. The catalog price is only 0.75 cents/GB/month, which is close to the price of a tape library, and can support 100 TB/day unfreezing capability without lengthy unfreezing time.

2. Comprehensive data protection to deal with various security threats

Safety and reliability are the foundation of cloud storage. In order to prevent data loss and damage caused by ransomware, system failures, natural disasters, and operation and maintenance accidents, it is necessary to implement unified data protection on the cloud and in the local data center. Through ECS snapshots and hybrid cloud backup HBR, it can provide backup and disaster recovery protection for the whole machine/cloud disk/file/database.

At the same time, HBR provides the ability of backups that cannot be tampered with, providing an additional layer of protection for backup data; it can also stratify backup data into hot and cold layers, and achieve the purpose of reducing costs and increasing efficiency on the premise that the retention time meets audit requirements. When multiple accounts are involved, customers can easily share snapshots with other authorized users, or use HBR for cross-account backup.

In order to avoid disasters at the region level, cross-regional replication must be done. Object storage OSS has replication time control from one region to another, that is, the object can be asynchronously replicated to another city within ten minutes, such as from Beijing to Guangzhou. Block storage EBS also has the same asynchronous replication capability.

This year, all snapshots and backup libraries of Alibaba Cloud have gradually acquired the capability of intra-city redundancy. Data can be stored in three data centers respectively. When one of the data centers fails, snapshots and backup data can still be read in other data centers, allowing enterprises to achieve high availability at the lowest cost.

In object storage OSS, there are access policies based on organizations, users, and resources. Enterprises can use Access Point to simplify the complexity of permission management for shared data, and set security baselines for control through Control Policy. Each business department must use OSS to enforce encryption, specify TLS version access, and set access control for VPC. Mandatory ACL settings It is private to prevent data leakage on OSS.

3. Separation of storage and calculation to accelerate scene-based performance

In different business scenarios, enterprises have different performance requirements. Some enterprise application loads (such as OLTP, Web caching, etc.) will pay more attention to real-time performance and are particularly sensitive to delay because they are closer to the front end of the application; while big data analysis scenarios that are relatively close to the back end have higher bandwidth requirements and are sensitive to delay Relatively low.

1. E-commerce scenario: ESSD AutoPL specification, leading the new direction of IO performance elasticity

In the serverless era, storage needs to intelligently adapt to load changes. Four years ago, Alibaba Cloud released an ESSD cloud disk with millions of IOPS. It is based on a new generation of self-developed distributed storage engine Pangu 2.0, which is suitable for delay-sensitive applications or I/O-intensive business scenarios (such as large OLTP databases). Taking a typical e-commerce business as an example, the performance/capacity coupling design has the following challenges:

  • There is a huge difference between daily traffic and business peaks, low utilization during off-peak periods, and a lot of waste of resources;
  • During the promotion period, the business peak time is short, and it is difficult to assess the peak demand, and there is a possibility of business damage.

To this end, Alibaba Cloud launched the ESSD AutoPL cloud disk for the "Serverless" era, which realizes the decoupling of cloud disk capacity and cloud disk performance while maintaining the original functions and performance of ESSD cloud disks. While configuring ESSD AutoPL cloud disk capacity, users can customize the pre-configured performance and performance burst of the cloud disk according to business needs, and easily cope with various complex scenarios such as e-commerce daily operations and flash sales promotions. The Zhihuo APP uses the ESSD AutoPL cloud disk to perfectly solve the problem of double 11 traffic peaks. At the same time, the cost is 42% lower than that of upgrading to PL2 cloud disks, and there is no need for long-term storage.

2. Data lake scenario: optimal performance solution under the storage-computing separation architecture

As mentioned earlier, big data analysis scenarios have higher bandwidth requirements. The high-throughput and low-latency service response capabilities of OSS can effectively support the access of various types of hotspot data. In order to meet higher throughput requirements, OSS also introduces the OSS accelerator function, which can cache hot objects in OSS, and is suitable for scenarios requiring large bandwidth and repeated data reading such as genetic training, machine learning, and big data computing.

The OSS accelerator is a standard server-side cache service that is completely decoupled from computing. At the same time, based on the OSS intelligent metadata architecture, the OSS accelerator provides strong consistency that traditional caching solutions do not have. When a file on OSS is updated, the accelerator can automatically identify it to ensure that the engine reads the latest data.

In the data lake scenario, the premise of data flow includes protocol compatibility and metadata compatibility, so multi-protocol access is indispensable. Object storage is a flat metadata architecture. There may be billions of files in a Bucket, and it will take a lot of time to process metadata (such as file renaming). As the foundation of cloud-native data lakes, OSS-HDFS fully integrates the big data storage ecosystem. In addition to providing flat namespaces for object storage, it also provides hierarchical namespace services. The hierarchical namespace supports organizing objects into a directory hierarchy for management, and can perform internal automatic conversion through unified metadata management capabilities, greatly shortening the data processing link.

3. Model training scenario: CPFS accelerates AI innovation

Whether it is the current hot AIGC or automatic driving, AI training is inseparable. Large-scale multi-machine multi-card parallel training requires a high-performance file system to support the high-throughput read and write requirements for data during the training process.

Alibaba Cloud Storage has been using RDMA technology on a large scale since 2018, and has developed the Solar-RDMA protocol itself to provide a stable, high-performance storage network. CPFS's advanced metadata and data all-parallel architecture can make full use of the end-to-end RDMA network advantages to achieve I/O acceleration and increase the training efficiency of PAI-Lingjun Intelligent Computing by 3 times.

File storage CPFS has realized two-way convenient data flow between OSS and OSS. During AI training, data can be stored in OSS, and after preprocessing, Lazyload (delayed loading) is sent to CPFS for training, and the resulting data flows back to OSS for persistent storage, reducing long-term data storage costs.

4. High-performance computing scenarios: Elastic file clients boost cloud-native computing power

In the serverless era, traditional file storage needs to evolve toward high density, elasticity, and speed. Alibaba Cloud File Storage launched the Elastic File Client (Elastic File Client, referred to as "EFC"). The innovative terminal access technology realizes the stable connection of the high-density computing terminal, the performance can be elastically scaled with the computing scale, and the fast mounting capability.

The metadata cache of the elastic file client can speed up the daily metadata operations of shared file storage NAS by 10 times, and the speed of opening and reading small 4K files is increased by 5 times, which is close to the level of local EXT4. The innovative multi-client Lease technology ensures that after the introduction of cache acceleration, it can still effectively support strong data consistency between multi-clients and ensure the correctness of AI parallel training results. The distributed data cache realizes that the throughput performance increases synchronously with the expansion of the computing cluster, breaking through the throughput limit of file storage.

In addition, EFC is also integrated with Alibaba Cloud ACK, ASK, and ECI through CNFS and Fluid. It can be used out of the box, which can perfectly match the high-density computing needs of scientific research, industrial simulation, AI training and other fields, and improve data processing efficiency.

4. The whole link can be observed and accurately queried to improve the efficiency of operation and maintenance

Cloud computing has become the water, electricity and coal in the new era, which requires rational use and management of cloud resources. The Alibaba CloudLens, which emerged as the times require, includes six modules: usage analysis, access analysis, anomaly detection, security analysis, performance monitoring, and data protection, allowing enterprises to implement OSS/SLS/EBS/ALB, etc. The refined operation and maintenance analysis of cloud products helps customers quickly build observable capabilities of cloud products and make good use of clouds.

In the digital transformation of thousands of industries, more and more enterprises choose to build IT systems on the cloud, so it is necessary to improve the speed of problem diagnosis and troubleshooting efficiency. As the three swordsmen of IT observability data, Logs, Traces, and Metrics can basically meet various monitoring, alarm, analysis, and troubleshooting needs. As a cloud-native observation and analysis platform, Log Service SLS can perform unified storage and integrated analysis of Log, Trace, Metric and other data, and has built-in functions such as automatic inspection, real-time notification of abnormalities, and root cause location to help enterprises quickly troubleshoot.

To improve the efficiency of operation and maintenance in data audit and supervision scenarios, it is imperative to strengthen metadata indexing capabilities. Object storage OSS creates and maintains an independent metadata management library for Bucket, and provides 9 categories of metadata filtering conditions, combined with 5 aggregation output methods, to help users achieve second-level fast data indexing among billions of files with aggregation. In addition, when a new file is uploaded to OSS, it can be automatically updated to the index pool within 10 seconds.

At the end of the sharing, Alex also officially announced on behalf of Alibaba Cloud that the first Data Insight Innovation Challenge will be launched immediately. The competition is divided into two tracks: intelligent operation and maintenance competition and data management innovation competition. The competition has corresponding competition questions, which are evaluated according to the quality of the code submitted by the contestants. Developers can participate for free, and the total prize pool is as high as 200,000 yuan! (Click here to enter now)

Digitalization is moving towards intelligence, which is the main line of a new round of cloud computing revolution. On the road of digital transformation of enterprises, Alibaba Cloud Storage will continue to implement the concept of "stable, safe, high-performance, inclusive and intelligent new storage" to help enterprises open the next chapter of digital innovation.

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/131102931