High-performance storage SIG monthly news: ANCK ublk completes POC test, EROFS optimizes xattr metadata overhead

High-performance storage technology SIG (Special Interest Group) goal: The high-performance storage technology interest group is dedicated to storage stack performance mining, currently focusing on kernel io_uring technology to optimize asynchronous IO performance, using persistent memory to improve business unit cost performance, container scene storage technology optimization etc. It is expected that through the community platform, a standard high-performance storage technology software stack will be created to promote the coordinated development of software and hardware.

01 Overall progress of SIG this month

This month, 26 PRs were merged into the Anolis mainline, including the updates of several important components. erofs supports simplified long xattr name prefixes to optimize xattr metadata overhead in overlayfs scenarios. ANCK 5.10 ublk has completed the POC test, compared with tcmu, the delay is optimized by 1 times. The io_uring asio coroutine optimization scheme has been determined, and the performance is expected to be optimized by 10%. The development of the DSMS management platform is basically completed, and the adaptation of dsms-storage to Anolis 23 is in progress. Thanks to ZTE students for submitting multiple bugfixes.

02Project specific progress

1、Anolis OS

ext4: fix ext4_xattr_delete_inode hang (PR1362)

xfs: xfs_qm cleanup (PR1326), fix xfs_sysfs_init memory leak (PR1332/PR1334), remove incorrect ASSERT in xfs_rename (PR1351), fix force shutdown UAF (PR1376)

fuse: fix fuse flush/resend interface bug (PR1302) nfs: fix RECLAIM_COMPLETE EACCES problem (PR1324/PR1325), fix memory leak of slot allocation failure (PR1346/PR1347), fix lack of lock protection problem when traversing grace_list (PR1350), fix parameters Parse null pointer (PR1370), handle CREATE_SESSION NFS4ERR_NOSPC (PR1368/PR1360) misc: fix hugetlbfs_parse_param null pointer (PR1352), fix configfs_create_dir memory leak (PR1357), fix nbd_start_device_ioctl hang (PR1356), fix rbd_sysfs_init ), fix md_cluster unlock_all_bitmaps wild pointer (PR1367), fix nvme_alloc_admin_tags null pointer (PR1405) vfio: Clear the caps->buf to NULL after free (PR1422/1427), fix drbd_create_device UAF (PR1251) VFS: fix ltp/openat04 (PR1 )

2. Container image acceleration

The erofs file system supports streamlined long xattr name prefixes, which are used to solve the additional overhead of storing overlayfs xattr metadata (such as composefs mode) with repeated names. For more background, please refer to: https://lore.kernel.org/r/ [email protected]

erofs-utils supports direct mounting from tarball generated index, the latest patch is being tested: https://lore.kernel.org/r/[email protected]

3. User state storage

So far, the upstream community has proposed three ublk zero-copy solutions: https://lwn.net/Articles/926118/

We are also investigating a zero-copy solution based on the io_uring register mode, and it is expected that an RFC patch will be sent to the upstream community in the future.

ublk has completed the round on ANCK 5.10, and conducted a POC test in the distributed storage project. The results show that ublk's single I/O delay can be optimized to half of tcmu, which has a great advantage.

ACNK 5.10 is very close to the performance data of ublk based on the previous 6.1 mainline test, indicating that the ublk of ANK 5.10 is basically available and will be released with version 5.10.134-014.

4、io_uring

Optimize the io_uring echo server test framework, and introduce workload to each IO request to simulate the real business environment. The performance of io_uring has started to increase steadily by about 10% compared with epoll, because the submission batch of io_uring has improved to a certain extent. During the workload processing of IO requests, the network may continue to have requests, so that multiple requests can be submitted in one io_uring_enter to improve the batch.

We adjusted the implementation scheme of io_uring asio coroutine, and realized asynchronous programming through the mulit-shot recv and provide-buffer mechanism based on io_uring and using the completion models mode.

The advantage of this scheme is that the io_uring recv operation is directly triggered during network interruption, shortening the entire IO link. The traditional Readiness programming model needs to wait for IO events first, and then initiate IO operations. At present, the POC code is about 60% complete.

5、DSMS

The development of dsms-engine is in progress, mainly to deploy some functions, and the test of version B is in progress simultaneously. Since the ceph version on Anolis 23 is 17.x, the dsms-storage repository is currently 15.x. After discussion at the SIG regular meeting, it was decided to put it in the dsms sub-warehouse for maintenance.

03 SIG's next plan

erofs Linux v6.4 is ready for merge into the window, erofs-utils supports tarball mode merge into the mainline, deflate compression algorithm support, etc.

Business adaptation of ublk distributed storage project.

io_uring asio optimization solution implementation and POC test.

DSMS continues to adapt Anolis 23.

Appendix: List of SIG projects

  • io_uring
  • virtiofs
  • container image acceleration
  • database optimization
  • User Mode Storage
  • DSMS

For more details, please refer to High Performance Storage Technology SIG​ ​. Interested developers are welcome to join in the co-construction.

-- over--

Guess you like

Origin blog.csdn.net/weixin_60347558/article/details/130149720