The new version of Fluid 0.3 is officially released: realizing universal data acceleration for cloud-native scenarios

Introduction: In order to solve the problems of high data access latency, difficulty in joint analysis, and complex multi-dimensional management in cloud-native computing and storage separation scenarios for data-intensive applications such as big data and AI, Nanjing University PASALab, Alibaba, and Alluxio In September 2020, the open source project Fluid was jointly launched.

Head picture.png

Author | Gu Rong Nanjing University PASALab

Introduction: In order to solve the problems of high data access latency, difficulty in joint analysis, and complex multi-dimensional management in data-intensive applications such as big data and AI in the cloud-native computing and storage separation scenario , Nanjing University PASALab, Alibaba, Alluxio are in In September 2020, the open source project Fluid was jointly launched .

Fluid is an efficient support platform for data-intensive applications in a cloud-native environment. Since the open source release of the project, it has attracted the attention of many experts and engineers in related fields. With the positive feedback from everyone, the development of the community is progressing rapidly. Recently, Fluid 0.3 was officially released, and three important functions were added, namely:

  • Realize general data storage acceleration and provide Kubernetes data volume access acceleration function
  • Strengthen data access security protection and provide fine-grained permission control functions for data sets
  • Simplify the configuration of complex parameters for users, and provide the optimization function of the internal parameter configuration of the native system

Fluid project address: https://github.com/fluid-cloudnative/fluid

The development requirements for these three main functions come from the actual production feedback of many community users. In addition, Fluid v0.3 has also carried out some bug fixes and document updates. Welcome to experience Fluid v0.3! Thanks to the community partners who have contributed to this version. We will continue to follow and adopt community suggestions to promote the development of the Fluid project. We look forward to hearing more feedback from you!

Fluid v0.3 download link: https://github.com/fluid-cloudnative/fluid/rele ases

The following is a further introduction to the functions of this new version release.

1. Support Kubernetes data volume access acceleration

Although the previous version of Fluid already supports many underlying storage systems (such as HDFS, OSS, etc.), in the actual production environment, the internal storage systems of the enterprise are often more diverse, and the situation that fluid cannot be docked due to incompatible storage systems still exists. For example, if a user uses the Lustre distributed file system, since the distributed cache engine used by the previous Fluid is not yet compatible with the Lustre system, the user will not be able to use Fluid normally.

In order to improve the versatility of Fluid in cloud-native data access acceleration scenarios, Fluid v0.3. adds acceleration support for data volume Persistent Volume Claim (PVC) and host directory (Host Path) mounting , so as to provide various storage systems with Fluid's docking provides a generalized acceleration solution: no matter which underlying storage system is used, as long as the storage system can be mapped to the Kubernetes native data volume PVC resource object or the host directory on the cluster node, it can pass Fluid enjoys the advantages brought by functional features such as distributed data caching and data affinity scheduling . The basic concept is shown in the figure below:

1.png

The specific usage method is very simple. Users only need to specify pvc://nfs-imagenet in mountPoint, where nfs-imagenet is an existing data volume in the Kubernetes cluster.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: fluid-imagenet
spec:
  mounts:
  - mountPoint: pvc://nfs-imagenet
    name: nfs-imagenet

[ Click to view system demonstration video ]

We used TensorFlow Benchmark to train the ResNet-50 model as a test scenario to verify the PVC access acceleration ability. The following are the speed improvement results:

2.jpg

From the evaluation results, the distributed cache capabilities provided by Fluid can increase the speed of the entire training task and shorten the overall training time by more than 20%. For more details about testing, please refer to the relevant sample documents on Github :

2. Access control of data sets

Many companies that provide machine learning platform services have multi-user shared storage systems and scenarios. For security considerations, machine learning platform service providers need to implement strict access control to ensure data isolation between users , that is, any unauthorized user is not allowed to access other people's data sets at will.

Fluid provides support for the above scenarios in v0.3: After the underlying storage system shared by multiple users is mounted on Fluid, the file permission information (such as user, file mode, etc.) exposed by Fluid will be consistent with the underlying storage system , Which realizes the transparent transmission of files from the underlying storage system to the node where Fluid is deployed . This means that the access control in the underlying storage system will also take effect on each node where Fluid is deployed, so as to ensure that the data isolation between users is not destroyed.

In addition, Fluid v0.3 also provides the feature of "temporary borrowing" of data sets. "Temporary borrowing" refers to a user who needs to have temporary access to a certain data set of another user. In Fluid v0.3, administrators can complete the conversion of data set ownership on the node where Fluid is deployed through flexible configuration, so as to give designated users the ability to "temporarily borrow" data sets from others, which can help cluster administrators achieve more detailed Granular and flexible data set authority management .

Documentation for accessing non-root user data: https://github.com/fluid-cloudnative/fluid/blob/master/docs/zh/samples/nonroot_access.md

3. Default parameter configuration optimization

Fluid provides many parameter configurations for users to customize their own applications. Before Fluid 0.3 version, users need to perform manual configuration completely according to the actual environment and business goals. However, it is difficult and difficult for most users to complete the configuration optimization manually. Heavy workload.

Fluid v0.3 has built-in a large number of default parameter configuration optimizations for internal components such as Alluxio and Fuse. Users no longer need to focus on parameter configuration optimization. The default parameter settings optimized based on our experience can achieve better performance in most common fluid usage scenarios.

to sum up

Fluid v0.3 mainly solves the problems and needs of community users in the actual production environment. The support for host directory and PVC mounting provides a universal solution for compatibility with different underlying storage systems; access control of data sets allows Fluid to truly meet the needs of the actual production environment shared by multiple users; the optimized default Parameter configuration increases Fluid's ease of use and maintains stable performance in most scenarios.

If you have any questions, please join the DingTalk exchange group to participate and discuss: https://img.alicdn.com/tfs/TB1Cm4ciNvbeK8jSZPfXXariXXa-452-550.png

Thanks

  • Thanks to Zhihao Xu and Yili Luo (PASALab, Nanjing University) for their contribution to supporting Kubernetes data volume access acceleration
  • Thanks to Lu Dongdong and Xie Yuandong (Yun Zhisheng) for their contribution to the data set permission control function

About the Author

Rong Gu Ph.D., Department of Computer Science, Nanjing University associate professor, research direction of the large data processing system, has been published in journals meeting the forefront of the field of TPDS, ICDE, JPDC, IPDPS, ICPP and so on more than 20 papers, presided over the National Natural Science Foundation of China / Youth A number of projects and the China Postdoctoral Science Foundation special funded projects. The research results were applied to Alibaba, Baidu, ByteDance, Sinopec, Huatai Securities and other companies and open source projects Apache Spark, Alluxio, and won the 2018 Jiangsu Science and Technology First Class Award, 2019 Young Science and Technology Award of Jiangsu Computer Society, serving as a member of the System Software Committee of the Chinese Computer Society/Communication Committee of the Big Data Committee, the Secretary-General of the Big Data Committee of the Jiangsu Computer Society, Fluid open source project co-founder, Member of PMC, Alluxio open source project.

" Alibaba Cloud Native focuses on technical fields such as microservices, serverless, containers, and Service Mesh, focuses on the trend of cloud native popular technologies, and cloud native large-scale landing practices, and is the official account for developers who know the best about cloud native."

Original link: https://developer.aliyun.com/article/775708?

Copyright statement: The content of this article is voluntarily contributed by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. Please refer to the "Alibaba Cloud Developer Community User Service Agreement" and "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines" for specific rules. If you find that there is suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Guess you like

Origin blog.csdn.net/alitech2017/article/details/109100913