Architecture Innovation and Commercial Value of Cloud Native Distributed Storage

2023 Global Distributed Cloud Conference

picture

The Global Distributed Cloud Conference is a flagship platform for distributed cloud technology and business communication. The 2023 Global Distributed Cloud Conference·Beijing Station will be officially held from June 28th to 29th. The theme is to explore the new development outlet of artificial intelligence (AI) in the large model, the trend of building a new ubiquitous computing power network, how to use distributed cloud, distributed database, distributed storage, edge cloud, etc. to build a new type of computing power network, to create A more powerful digital economy value engine.

picture

At the Distributed Storage Forum on the afternoon of June 28, Cui Jian, head of Tencent's cloud storage products, delivered a speech titled "Architecture Innovation and Commercial Value of Cloud Native Distributed Storage".

01

Tencent cloud native storage full matrix                  

After more than ten years of polishing and training, the Tencent cloud storage team has built relatively complete storage reliability, consistency, availability, scalability and other capabilities at the underlying level based on the storage engine of Tencent Group. All internal and external businesses of Tencent Group's storage All built upwards based on Tencent's engine.

picture

Based on the Tencent storage base, more product capabilities and solutions can be expanded upwards. Product capabilities are divided into multiple layers. The bottom layer is the external core engine of Tencent Storage. Public storage provides core external productization capabilities for Tencent Cloud Storage public cloud. Tencent Data Vientiane CI provides Tencent Cloud's external intelligent storage data processing capabilities. The second largest engine of Tencent Cloud Storage. The third engine is TStor, which is the core engine of Tencent Cloud Storage's privatized deployment and delivery capabilities for some private industries such as government, finance, and universities.

Based on the three engines of CI, COS, and TStor, Tencent Cloud Storage expands a variety of products and solutions upwards. The first is product-level solutions such as data lakes, hybrid storage, view computing, enterprise network disks, and backup services. These solutions will be based on The form of PaaS or SaaS is closer to the customer industry, allowing customers to use storage services out of the box.

Based on the further expansion of Tencent's storage product capabilities, Tencent Cloud Storage and several industries of Tencent Cloud have jointly penetrated into user scenarios, integrating scenarios including big data, AI, and hybrid clouds, involving operators, finance, government, and university Internet industries. Customers from all walks of life provide targeted solutions according to their unique usage scenarios. This is the overall layout of Tencent Cloud's native storage. 

02

Public cloud object storage COS - providing a stable, massive and elastic cloud-native storage base           

picture

The cloud-native storage solution on the public cloud needs to rely on the object storage on the public cloud as the cloud-native base. The object storage of the public cloud has undergone major changes and innovations over the years. Public cloud object storage is more classic in scenarios such as storage distribution, archive backup, big data, etc. From the perspective of users, this product is used based on several links.

The first is data upload. First, data needs to be generated and uploaded. Based on the generation and upload of object storage, Tencent Cloud Storage packs a large number of related operation paths and solutions, such as UGC upload, local IDC offline data storage upload, and Tencent Cloud Storage respectively provides Similar to CDM offline migration equipment, MSP online cross-cloud data horizontal migration platform, etc., allow users to solve the problem of data upload.

After the data is uploaded, it enters the data storage link, which is equivalent to the data management of Party A’s company’s operation and maintenance or R&D personnel. First of all, it is necessary to ensure availability, select models, and define the cost performance of storage products. In terms of multi-level storage, Tencent Cloud Storage provides one of the most diverse storage categories in the industry. Based on the hot and cold definition of the customer's own data, the data will be transferred to a more appropriate grouping type at a better time, and it will also Do a good job of cross-region integration.

The third link is data processing and mining. The processing link is also divided into a variety of different subdivision scenarios, such as completing a UGC global distribution platform or foundation, the original data cannot be used directly after uploading, and needs to be processed at multiple levels and links, such as data review and quality processing , cropping, size, watermark, etc., Data Vientiane products provide comprehensive data processing capabilities.

There is also off-line processing. For example, if you need to complete the offline big data analysis system and connect it to BI, you may need to log the user's behavior for MPP analysis. This depends on the solution of Tencent Cloud Storage Data Lake. The upper layer big data The computing power provides better support for storing data lakes, and releases the high-bandwidth and low-latency performance of Tencent Cloud Storage.

The final link after data processing is data release. The available data must be handed over to data users. Data users are either netizens all over the country or the world, or data development engineers within the enterprise. In this link, the real-time processing capability of Data Vientiane can be used with some downstream peripheral systems, such as Data Vientiane image compression + CDN for global distribution, and finally the data will be released at the lowest cost. This is a relatively classic public cloud object storage application. model. Tencent Cloud Storage has completed years of polishing in this part, aiming to provide a stable, massive, and elastic cloud-native storage base.

picture

Tencent object storage COS currently has standard, low frequency, archive, and from hot to cold. Soon Tencent Cloud Storage will add a new level of storage - cold storage, which is between low frequency and archiving. Cold storage is also an online access. Users do not need to wait offline for minutes or hours when GET, and can retrieve data immediately, matching the demands of multiple online systems for real-time retrieval. At the same time, its cost is lower than that of specimens. and low frequencies dropped by a lot of percentage points.

At the same time, deep archive should also be mentioned here, which is the coldest level of Tencent storage. Deep Archiving innovatively uses some new media that are not HDD disk media on the cloud, such as tape Blu-ray, etc. Through the cloud storage of new media, the lower cost of cloud storage and extremely cold storage is reduced, and its cost is compared with disk-based The built storage will drop by more than 50%.

The storage based on so many sets will bring difficulties to the operation and maintenance engineers on the user side. What kind of storage type should different hot and cold data be placed in? The concept may be very easy to understand when described. Hot data is placed in the standard, cold data Put it in the archive. But in fact, it is a difficult problem to define the hot and cold of data. If it is not a very experienced operation and maintenance engineer, it may be possible to put the data in the corresponding storage type very clearly based on the understanding of the business system. However, if the data system is more complicated, Or the relative experience of the operation and maintenance engineers is not so rich, and there will be problems in operation and maintenance.

Tencent Cloud Storage provides a series of solutions around this pain point. The first is intelligent layered storage. Tencent Cloud Storage provides the concept of intelligent layered storage, that is, packaging part of the underlying logic in a foolish way. Users only need to directly store the hot data in their hands Throwing it into the intelligent hierarchical storage type, the system will draw a user portrait based on the definition and judgment of the user usage model, and speculate on what frequency the user will read and write in the future. If the system captures that the data has not been accessed for a long time, the system will automatically help the user to settle the data to a cooler low-frequency layer to help the user save costs.

When the data drops to a low frequency, and suddenly there are more access requests for a period of time, the system will automatically increase the popularity of the data to a high frequency layer, so that users can get better delay and better performance. Based on such a product form, operation and maintenance engineers can greatly reduce the difficulty of work.

The other is the analysis capability of the intelligent storage type. First, the user filters the scope to confirm which bucket and which data need to be analyzed. After determining the scope, Tencent Cloud Storage will build a larger model judgment at the bottom layer, and then analyze the original logs based on the user's access logs one by one, so as to give the user a more straightforward result and recommend where the user's data should be stored. It is a better choice for users.

Based on such product capabilities, Tencent Cloud Storage not only provides a variety of storage types, but also provides a relatively complete intelligent recommendation system to help users enjoy lower costs and lighter operation and maintenance capabilities.

Tencent Cloud is one of the earliest proposers of the concept of smart storage. Users will generate massive amounts of data and store them in object storage. As more data accumulates, users will spend more on storage every day. Tencent Cloud is committed to helping customers reduce business costs and increase efficiency by mining and extracting data while reducing costs. data value.

03

Data Vientiane CI - cloud-native intelligent storage base, empowering business intelligence          

picture

Tencent Cloud Storage provides a set of related intelligent storage solutions. This product system is called Data Vientiane (CI). Data Vientiane CI is a one-stop intelligent platform focused on data processing. It provides image processing, media processing, content review, content AI identification, document services and other multimedia data services for the common scenarios where users need to face multiple services. processing power. At the same time, it is deeply integrated with object storage COS, providing out-of-the-box data processing and AI intelligent processing capabilities, reducing user costs, improving user experience, and helping users tap the value of data, which is a sharp tool for reducing costs and increasing efficiency.

Data Vientiane CI is the first cloud vendor in China to provide AVIF image compression (50%+ smaller than JPEG, WebP and other formats); it is also the storage + processing platform with the most abundant processing capabilities.

Data Vientiane CI has three major features, which are 1-stop, 0 traffic, and 30% faster.

1-stop: a set of API, ready to use out of the box, lowering the threshold of use;

0 traffic: zero external network traffic is called between services, and the user cost is low;

30% faster: Image processing speed is 30% faster than competing products on average.

Based on these three major product features, Tencent Cloud Storage also provides workflow capabilities at the engineering level, allowing users to build a drag-and-drop workflow to form a direct drag-and-drop natural generation. The system will perform serial or The parallel processing capability is automatically triggered through workflow, helping users further reduce the difficulty of operation and maintenance. In addition, Tencent Cloud Storage is also simultaneously mining upward solutions. For example, content production, mobile phone photo album clustering, intelligent search, etc. can all be intelligently stored data Vientiane CI as the core, and the engine can be quickly built.

04

Tencent Cloud View Computing Platform——provides an end-to-end cloud-native video surveillance storage solution           

picture

Different from the traditional video surveillance construction system, the view computing platform has the following characteristics

1. Provide a variety of ways to go to the cloud, such as standard protocol direct connection, private protocol encryption to the cloud, edge-end gateway cascade connection, etc., which can solve the problem of unified management of users' cross-regional, multi-vendor, and multi-protocol terminal devices, and help users realize Devices can be connected to the cloud more quickly.

2. Support direct storage of data in the customer's own COS storage bucket and hierarchical storage of data, solve customer data cloud compliance requirements, and reduce customer cost investment

3. "Access + storage + analysis" full-link service, while providing basic video SaaS+AI algorithm application, so that customers can truly experience a one-stop video closed-loop experience

This is also some special value brought by Tencent Cloud's original storage team. At the same time, the product is connected with object storage COS and intelligent storage CI to provide cost-effective view intelligent management capabilities; it is cooperating with pan-Internet, retail, manufacturing, and operators. and other industrial ecology to open up and provide more scenario-based solutions.

05

Data lake storage GooseFS - a new form of cloud-native data lake storage, a multi-level acceleration system to help businesses release performance        

picture

Data lakes are divided into computing and storage, and there is still a lack of relatively uniform standards in the direction of data lake storage.

GooseFS is oriented to various business scenarios of data lakes. According to the size and performance requirements of Dataset, GooseFS provides various cache acceleration solutions including MEM and NVME SSD. GooseFS provides terabytes of throughput and millions of IOPS; the full amount of data is persisted on COS, providing massive low-cost storage, and supporting data lifecycle management; utilizing the remaining local memory and disk fragment resources of computing nodes to integrate high-performance caches; big data (Search category), AI training, model training, and automatic driving four ecology, have high bandwidth, low latency, and small IO read performance requirements for storage.

06

Tencent Cloud Enterprise Network Disk——Improve enterprise office efficiency and help data create business value         

picture

The cloud-native enterprise network disk system is a SaaS-based product form. It can be used out of the box. Users can interact with files and people through client access or browser login, but the B-end network disk and C-end network disk Big difference. The C-end network disk mainly solves some backup or long-term storage demands of personal photos, videos, and phone books, while the enterprise network disk solves the needs of enterprise employees, bosses, and customers for material storage and distribution.

Tencent Cloud Enterprise Netdisk is integrated with Tencent Conference, Tencent Electronic Signature, Tencent Cloud Desktop and other products to jointly form a family bucket for Tencent Cloud enterprise office scenarios. At the same time, it combines the capabilities of OCR, image search by image, tag search, and clustering in Data Vientiane. Provide AI intelligent office experience, build an intelligent office system and improve enterprise office efficiency through the capabilities of document collaborative editing, efficient data distribution and sharing, and one-click enterprise knowledge base.

Tencent Cloud Enterprise Network Disk satisfies basic file operations, including providing various cloud file operation capabilities and aligning Windows local operation experience; at the same time, it provides collaborative office capabilities, supports multi-person collaborative editing, efficient multi-distribution and sharing of data, and other functions to improve collaboration efficiency; in addition It also supports mobile office, and supports various mobile terminals to access anytime and anywhere; it can also be deployed flexibly, and supports multiple deployment modes such as public cloud and private cloud.

Cloud-native storage usage scenarios - AIGC training and inference platform built on cloud-native storage

AIGC is a relatively hot topic recently. With the new and mature Chat GPT in North America at the end of 2023, the wave of large models will also spread to China. AIGC is a typical data lake + AI application scenario. Data needs to be stored in a unified manner and connected to multiple processing platforms at the same time, so that data can flow freely between multiple platforms.

Among them, the storage of the training scene has three major demands: unified storage of the data lake, free flow of data between businesses, and high throughput and low latency; while the reasoning scene has two core demands of content review and content intelligence.

Among them, content review is a very important task. Tencent Cloud Storage has done a lot of articles on the compliance of inference products in the inference link. In the user reasoning link, the user asks a question, and after the question and the brain produce a reasoning product, the product must first be sent to the data review engine on the cloud. After the judgment of multiple latitude execution degrees, the product is considered to be reasonable. Only when there are materials with no problems can they be finally fed back to the user.

The advantages of Tencent Cloud Storage are: 1. Convenient access, providing an integrated storage content security solution, one-click auditing of incremental data, and extremely low development costs; 2. Accurate model, tailored to the AIGC scenario audit strategy Special tuning and customized development of the underlying model; 3. Higher performance, intelligently scheduling processing clusters based on stored data, and processing capabilities near the storage side provide better data transmission delay and lower cost.

07

Autonomous driving acquisition, training and simulation platform based on cloud native technology          

picture

In the past two years, China has developed rapidly in the field of autonomous driving, especially in the field of AI. This is also a segmented track where China has some local advantages compared with foreign countries. Based on this scenario, Tencent Cloud has more cooperation with many car manufacturers or contractors of autonomous driving solutions to provide a full-stop house, and this service storage plays an important role in this. Tencent Cloud provides full-process services such as autonomous driving data collection, storage, labeling, calculation, algorithm training, simulation and evaluation, and algorithm iteration based on data backhaul.

Guess you like

Origin blog.csdn.net/Tencent_COS/article/details/131769676