Curve block storage application practice -- iSCSI

Curve is a Sandbox project of the Cloud Native Computing Foundation (CNCF). It is an open-source high-performance, easy-to-operate, and cloud-native distributed storage system initiated by NetEase Shufan.

In order to make it easier for everyone to use and understand Curve, we hope to show you Curve in the form of a special topic through a series of application practice articles.

This article is the first of Curve's block storage application practice. This series of articles includes:

  • Curve block storage application practice one step iSCSI
  • Curve block storage application practice two-part nbd
  • Curve block storage application practice trilogy cloud host
  • Curve block storage application practice four-part cloud native database
  • Curve block storage application practice five-step performance tuning

Introduction to iSCSI and tgt

tgt is an open source iSCSI server, see tgt github[1] for details. When we develop the Curve block device server, we want more systems to use Curve block devices, not just Linux systems. The iSCSI protocol is a widely used block device protocol. We want to modify tgt so that Curve can provide iSCSI services.

Curve block storage

The driver for accessing Curve is provided for tgt. For details, see Deploying the Network High-Performance Version tgt[2]. There are operation steps in the document, so that users can use Curve block device storage on any operating system that supports iSCSI, such as Windows.

Curve also encountered some problems when initially using tgt:

We observed that the original tgt uses a single main thread, epoll, to process iSCSI commands, and the unix domian socket of the management plane is also in this main thread.

On a 10 Gbit/s network or even a faster network, the speed of processing iSCSI commands with a single thread (that is, a single cpu) can no longer keep up with the demand. When one thread handles multiple targets, multiple iSCSI Initiators The request speed is slightly higher, and the cpu usage rate of this single thread is 100% busy.

So the focus of this article is to introduce the performance optimization of tgt. At the same time, community users also encountered single-point and performance problems of nebd services during use, and community users have also optimized this. For details, please refer to Chuangyun Rongda's Curve-based smart tax scenario practice.

Curve's performance optimization practice for tgt

1. Use multiple threads to do epoll

Implement multiple event loop threads, and each thread is responsible for processing iSCSI commands on a certain number of socket connections. In this way, the processing power of multiple CPUs can be utilized.

2. Create an epoll thread for each target

In order to avoid the problem of exceeding the processing capacity of a single cpu when multiple targets share one epoll, we set up an epoll thread for each target. The CPU usage of target epoll is scheduled by the OS, so that fair CPU usage can be achieved on each target. Of course, if the network speed is faster, there will still be a single epoll thread that cannot handle a request on an iSCSI target, but this solution is still the best we can do.

3. Management plane

The management plane maintains compatibility with the original tgt. In terms of command line usage, there is no difference, no modifications. The management plane provides services on the main thread of the program. The main thread is also an epoll loop thread, which is no different from the original tgt. It is responsible for the management of target, lun, login/logout, discovery, session, connection, etc. When the Intiator connects to the iSCSI server, it is always served by the management plane thread first. If the connection finally needs to create a session to access a target, the connection will be migrated to the epoll thread of the corresponding target.

4. Data structure locks

Provide a mutex for each target. When the target epoll thread is running, the lock is locked by the thread, so that the thread can arbitrarily end a sesssion or connection. When the thread enters epoll_wait, the lock is released Yes, the lock will be locked again when epoll_wait returns. We modified the relevant code so that the epoll thread does not need to traverse the target list, but only accesses the target-related structures it serves, so that we do not need the target list lock. When the management plane also adds or deletes a session or connection, it also needs to lock the target lock. Therefore, the management plane and the target epoll thread use this mutex to mutually exclude each other, so that they can safely access the session and connection on the corresponding target.

5. connection establishes session

When login_finish succeeds, login_finish sometimes creates a session (if no session exists). login_finish sets the target iSCSItarget in the field migrate_to of the connection structure.
6. When to do connection migration

When the call returns to iscsi_tcp_event_handler, because login_finish sets the migrate_to target target, iscsi_tcp_event_handler locks the target iscsi target structure, and inserts the fd of the connection into the evloop of the target target to complete the migration.

7. Set pthread name

Set the thread name of each target event loop in top as tgt/n, where n is the target id, so that it is easy to use tools such as top to observe which target occupies the most CPU.

8. Give an example

If MGMT wants to delete a target, the following code illustrates the process:

/* called by mgmt */
tgtadm_err tgt_target_destroy(int lld_no, int tid, int force)
{
        struct target *target;
        struct acl_entry *acl, *tmp;
        struct iqn_acl_entry *iqn_acl, *tmp1;
        struct scsi_lu *lu;
        tgtadm_err adm_err;

        eprintf("target destroy\n");

        /*
         * 这里因为控制面是单线程的,而且SCSI IO线程不会删除target,
         * 所以我们找target的时候并不需要锁
         */

        target = target_lookup(tid);                                  
        if (!target)                                            
                return TGTADM_NO_TARGET;

        /*
         * 这里要锁住target,因为我们要删除数据结构,所以不能和iscsi io
         * 线程一起共享,必须在scsi 线程释放了锁时进行
         */        target_lock(target);                                            
        if (!force && !list_empty(&target->it_nexus_list)) {
                eprintf("target %d still has it nexus\n", tid);
                target_unlock(target);                 
                return TGTADM_TARGET_ACTIVE;
        }        
 …
        /* 以上步骤删除了所有资源 ,可以释放锁了 */
        target_unlock(target);                                               
        if (target->evloop != main_evloop) {
                /* 通知target上的evloop停止,并等待evloop 线程退出 */
                tgt_event_stop(target->evloop);                         
                if (target->ev_td != 0)                                 
                        pthread_join(target->ev_td, NULL);
                /* 下面把evloop的资源删除干净 */
                work_timer_stop(target->evloop);                      
                lld_fini_evloop(target->evloop);
                tgt_destroy_evloop(target->evloop);
       }

performance optimization results

We configured 3 disks for tgt, a Curve block storage volume, and two local disks

 <target iqn.2019-04.com.example:curve.img01>
    backing-store cbd:pool//iscsi_test_
    bs-type curve
</target>

<target iqn.2019-04.com.example:local.img01>
    backing-store /dev/sde
</target><target iqn.2019-04.com.example:local.img02>
    backing-store /dev/sdc
</target>

Use this machine to log in iscsi iscsiadm --mode node --portal 127.0.0.1:3260 --login

To set up access to these iSCSI block devices for fio, use:

[global]
rw=randread
direct=1
iodepth=128
ioengine=aio
bsrange=16k-16k
runtime=60
group_reporting

[disk01]
filename=/dev/sdx

[disk02]
filename=/dev/sdy
size=10G

[disk03]
filename=/dev/sdz
size=10G

The test results are as follows:

The following is the unoptimized fio score, IOPS 38.8K


​The following
is the fio score after multi-thread optimization, IOPS 60.9K


​<
Original author: Xu Yifeng, Curve PMC>
Reference [1]:
https://github.com/fujita/tgt
Reference [2]:
https://github.com/opencurve/curveadm/wiki/curve-tgt-deployment #%E7%AC%AC-4-%E6%AD%A5%E5%90%AF%E5%8A%A8-tgtd-%E5%AE%88%E6%8A%A4%E8%BF%9B% E7%A8%8B

The country's first IDE that supports multi-environment development——CEC-IDE Microsoft has integrated Python into Excel, and Uncle Gui participated in the framework formulation. Chinese programmers refused to write gambling programs and were pulled out 14 teeth, with 88% body damage . Podman Desktop, an open-source imitation Song font, breaks through 500,000 downloads. Automatically skips opening screen advertisements. The application "Li Tiao Tiao" stops updating indefinitely. There is a remote code execution vulnerability Xiaomi filed mios.cn website domain name
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4565392/blog/5602936