NVMe agreement to share notes

NVMe Overview

               NVMe is a high performance for the PCIe-based SSD, extensible host controller interface.

               NVMe significant feature is to provide a plurality of queues to handle the I / O command. NVMe single device supports up to 64K I / O queues, each I / O queue commands can manage up to 64K.

               When the host issues an I / O command when the host system a command is placed to the submission queue (SQ), and then use the DOORBELL register (DB) notify NVMe device.

               After the device has been processed NVMe I / O command, the processing result is written to device completion queue (CQ), and raises an interrupt to inform the host system.

               NVMe using MSI / MSI-X interrupts and interrupt the polymerization process to improve the performance.

 

NVMe drive Overview

NVMe drive is a C library, it can be linked directly to the application to provide a direct, zero-copy transmission of data between an application and NVMe SSD. This is completely passive, meaning no open thread, just perform a function call from the application itself. This library function directly NVMe control device, by PCI BAR register is directly mapped to the local process is then performed based on memory-mapped I / O (MMIO). I / O is asynchronous submit (QP) through the queue, its general implementation process, not all that different compared with the Linux libaio up.

 

 NVM Express (NVMe) is a register-level interface that allows communication with the host subsystem software and NVM. NVMe management interface (NVMe-MI) permit band communication management controller via one or more external interfaces NVMe the NVM subsystem.

NVMe is a communication protocol between Host and one kind of SSD

Figure 1: NVMe hierarchical management interface protocol

 

 

 NVMe-MI management component using a transfer protocol (the MCTP) as a command transmission and use of existing MCTP SMBus / I2C binding and PCIe physical layer.

 NVMe SSD is born. Before NVMe appear, SSD and the vast majority is taking the SATA AHCI protocol, which is actually a traditional HDD services. Compared with HDD, SSD has a lower latency and higher performance, AHCI has been unable to keep pace with the development of SSD performance, and has become the bottleneck of SSD performance.

Compared with the command ATA spec defined, NVMe number of command a lot less, completely tailored for the SSD.

NVMe Sambo: Submission Queue (SQ), Completion Queue (CQ) and Doorbell Register (DB). CQ and SQ memory located in the Host, DB controller are located inside the SSD. Above:
 

 

Host CQ and SQ in the memory in the SSD and the end of DB, the image above is generally NVMe Subsystem SSD.
SQ located Host memory, while Host To send a command, first prepared in command SQ, and then inform the SSD to take; CQ is located Host memory, a command is completed, the success or failure, always to CQ SSD the write command completion status.

DB is doing with it? When Host send commands, not sent directly to the SSD in command, but the command is ready to put its own memory, the SSD notice that how to get command of it? Host is through the end of the SSD write DB register to tell the SSD.

  

1.4 Architecture Model

Figure 2: Single Port PCIe SSD

 

 

Figure 3: with SMBus / I2C dual port PCIe SSD

   NVMe management interface for sending a command message, the command message from the controller to the target in NVM subsystem standard NVMe administrative commands; NVM subsystem controller for accessing the PCI Express configuration, I / O and memory space command; specific command and management interface for inventory, configuration and monitoring NVM subsystem.

 

 Figure 4: single port associated with the PCIe SSD NVM subsystem

 

  

图5示出了与图3中所示的PCIe SSD相对应的示例NVM子系统。NVM子系统,包含一个与PCIe端口0相关联的控制器和两个与PCIe端口1相关联的控制器。存在与每个PCIe端口相关联的管理端点和SMBus / I2C端口。由于NVM子系统包含管理端点,因此所有控制器都具有关联的控制器管理接口。

 图5:与带有SMBus / I2C的双端口PCIe SSD相关的NVM子系统

 

管理接口请求消息和响应消息作为MCTP消息传输,消息类型通过MCTP设置为NVM Express管理消息(请参阅MCTP ID和代码规范)。 所有命令消息都源自管理控制器,并从管理端点生成响应消息。

 

 

4消息处理模型

NVMe-MI使用请求和响应处理模型。

图14:NVMe-MI MCTP消息分类

 

 

 4.1 request消息

request消息是由管理控制器生成的NVMe-MI消息,用于发送给管理端点。

request消息指定管理端点要执行的操作。

4.2 reponse消息

reponse消息是管理端点完成时生成的NVMe-MI消息处理先前发出的request消息。

 

 NVM Express基于配对的提交和完成队列机制

 命令由主机软件放入提交队列。完成被放入控制器关联的完成队列。多个提交队列可以使用相同的完成队列。提交和完成队列在内存中分配。    
 存在管理员提交和关联的完成队列以用于控制器管理和控制(例如,创建和删除I / O提交和完成队列,中止命令,等等)。只有属于管理员命令集的命令才可以提交给管理员提交队列。  
 I / O命令集与I / O队列对一起使用。该规范定义了一个I / O命令集,命名为NVM命令集。主机选择一个用于所有I / O队列的I / O命令集对。  
 主机软件创建队列,最高可达控制器支持的最大值。通常的数量创建的命令队列基于系统配置和预期的工作负载。例如, 在基于四核处理器的系统上,每个核心可能有一个队列对,以避免锁定和确保数据结构在适当的处理器核心缓存中创建。图1提供了图形队列对机制的表示,显示提交队列和。之间的1:1映射完成队列。图2显示了多个I / O提交队列使用相同的示例核心B上的I / O完成队列。图1和图2显示了之间始终存在1:1管理员提交队列和管理员完成队列。  
 
 

 提交队列(SQ)是一个循环缓冲区,具有主机软件用于提交的固定插槽大小控制器执行的命令。主机软件更新相应的SQ Tail门铃当有一到n个新命令要执行时注册。之前的SQ Tail值被覆盖当有新的门铃寄存器写入时控制器。控制器按顺序提取SQ条目但是,提交队列可以按任何顺序执行这些命令。  

 

4.1.1空队列 
当Head入口指针等于Tail入口指针时,队列为Empty。图8定义了Empty队列条件。  
                                  图8:空队列定义  

 

 
 
 
 
4.1.2满队列  
当Head等于尾部时,队列为Full。队列中的条目数full比队列大小少一个。
图9定义了完整队列条件。注意:在确定队列是否为Full时,应考虑队列包装条件。  
                                  图9:完整队列定义  

 

 

7控制器架构

主机软件(Host)通过预先分配的提交队列向控制器(Controller)提交命令。通过SQ Tail Doorbell寄存器写入警告控制器新提交的命令。前一个门铃寄存器值和当前寄存器写入之间的差异表示已提交的命令数。

控制器从提交队列中提取命令并将它们发送到NVM子系统进行处理。

命令处理

 

 
1.主机将一个或多个命令放置在位于内存中的提交队列(SQ)的下一个可用的槽位中执行。
2. Host用SQ尾部指针的新值去更新SQ的TailDB寄存器。这告诉了SSD控制器有一个新的命令被提交需要被处理。
3. SSD控制器将命令从SQ中转移到控制器中以供下一步执行。(从哪一个SQ中取出下一条候选命令去执行的仲裁方法,请参见4.11一节。)
4.控制器接下来执行下一条命令。命令的执行完成可能是乱序的(与提交或开始执行的时间点无关)。
5.在命令完成执行之后,SSD控制器将一个完成队列条目(CQE)放在相关的完成队列(CQ)的下一个空闲槽位中。作为CQE的一部分,SSD控制器通过修改完成条目的SQ头指针指示最新的SQE已经被消费了。每一个新的CQE都有一个从前一个条目中反转的相位标记(Phase Tag), 以向Host表明这个CQE是一个新条目。
6. SSD控制器给Host产生一个中断,以表明有一个新的CQE已经产生,可以被消费和处理了。在图中演示的是MSI-X中断,然而,中断也可以是基于PIN或者MSI的中断。注意:基于中断联合设置,可能或不能为每一个新的CQE产生一个中断。
7. Host消费和处理在CQ中放置的新的CQE。包括基于错误情况采取的任何操作。Host继续消耗和处理CQE,直到它遇到以前消费的一个条目的相位标签(Phase Tag)从当前完成队列条目(CQEs)的值中反转。
8. Host更新CQ的HeadDB寄存器,表明CQE已经被消费了。在更新相关联的CQ的HeadDB寄存器之前,Host可能消费了多个CQE。
 
通俗易懂的话总结一下就是:
1. Host写命令到SQ
2. Host更新SQ的TailDB, 通知SSD取命令
3. SSD收到命令,于是从SQ中取出命令
4. SSD执行命令
5. 命令执行完成后,SSD往CQ中写入命令执行结果,同时修改CQ的TailDB
6. SSD发短信通知Host命令已经执行完成
7. Host收到命令后,到CQ中查看命令完成状态
8. Host处理完CQ中的命令执行结果,更新CQ中的HeadDB, 回复SSD, "命令执行结果已经处理完毕,辛苦啦"
 
NVMe over PCIe和RDMA本质上都是“玩队列”。 NVMe over PCIe有两条队列,一条提交队列(SQ)和一条完成队列(CQ);而RDMA有三条队列,一条发送队列(SQ),一条接收队列(RQ)和一条完成队列(CQ),而一个SQ和一个RQ被称之为一个QP(队列对)。
 
对应nvme驱动代码位置:

nvme_pcie.c

1.Int nvme_pcie_qpair_submit_request()

              TAILQ_INSERT_TAIL(&pqpair->outstanding_tr, tr, tq_list);

(对应app代码位置:bdev_nvme_submit_request( ) )??是吗,只是名称相近?

2. static void nvme_pcie_qpair_complete_tracker()

              TAILQ_INSERT_HEAD(&pqpair->free_tr, tr, tq_list);

 

bdev_nvme.c

static const struct spdk_bdev_fn_table nvmelib_fn_table = {     (device function table)

              .destruct                           = bdev_nvme_destruct,

              .submit_request             = bdev_nvme_submit_request,

              .io_type_supported       = bdev_nvme_io_type_supported,

              .get_io_channel                             = bdev_nvme_get_io_channel,

              .dump_info_json            = bdev_nvme_dump_info_json,

              .write_config_json          = bdev_nvme_write_config_json,

              .get_spin_time                = bdev_nvme_get_spin_time,

};

static struct spdk_bdev_module nvme_if = {

              .name = "nvme",

              .module_init = bdev_nvme_library_init,

              .module_fini = bdev_nvme_library_fini,

              .config_text = bdev_nvme_get_spdk_running_config,

              .config_json = bdev_nvme_config_json,

              .get_ctx_size = bdev_nvme_get_ctx_size,

 

};


关于MSI-X,在igb_uio.c里igbuio_msix_mask_irq( )
浅谈NVMe与MSI-X

https://blog.csdn.net/wangpeng22/article/details/78390694?locationNum=2&fps=1
https://blog.csdn.net/weijitao/article/details/46566789
http://blog.sina.com.cn/s/blog_6472c4cc0102dskj.html

 

 NVMe制定了Host与SSD之间通讯的命令,以及命令如何执行的。
NVMe有两种命令,
一种叫Admin Command,用以Host管理和控制SSD;
另外一种就是I/O Command,用以Host和SSD之间数据的传输。下面是NVMe1.2支持的命令列表:

NVMe支持的Admin Command

  


NVMe支持的I/O Command

 

 
lib/bdev/nvme/bdev_nvme.c
_bdev_nvme_submit_request( )有IO操作的处理,
  
nvme_ctrlr_ut.c
test_nvme_ctrlr_init_en_1_rdy_0
test_nvme_ctrlr_init_en_1_rdy_1
test_nvme_ctrlr_init_en_0_rdy_0
test_nvme_ctrlr_init_en_0_rdy_1
test_nvme_ctrlr_init_en_0_rdy_0_ams_rr
test_nvme_ctrlr_init_en_0_rdy_0_ams_wrr
test_nvme_ctrlr_init_en_0_rdy_0_ams_vs
test_alloc_io_qpair_rr_1
test_ctrlr_get_default_ctrlr_opts
test_ctrlr_get_default_io_qpair_opts
test_alloc_io_qpair_wrr_1
test_alloc_io_qpair_wrr_2
test_spdk_nvme_ctrlr_update_firmware
test_nvme_ctrlr_fail
test_nvme_ctrlr_construct_intel_support_log_page_list
test_nvme_ctrlr_set_supported_features
test_spdk_nvme_ctrlr_doorbell_buffer_config----5 Admin Command Set
test_nvme_ctrlr_test_active_ns
 
nvme_ctrlr_cmd_ut.c  -----  AdminCommand的功能测试集
test_get_log_pages----5 Admin Command Set
test_set_feature_cmd----5 Admin Command Set
test_set_feature_ns_cmd----5 Admin Command Set
test_get_feature_cmd----5 Admin Command Set
test_get_feature_ns_cmd----5 Admin Command Set
test_abort_cmd----5 Admin Command Set
test_io_raw_cmd
test_io_raw_cmd_with_md
test_namespace_attach----5 Admin Command Set
test_namespace_detach----5 Admin Command Set
test_namespace_create----5 Admin Command Set
test_namespace_delete----5 Admin Command Set
test_format_nvme
test_fw_commit----5 Admin Command Set
test_fw_image_download----5 Admin Command Set

Guess you like

Origin www.cnblogs.com/whl320124/p/11058891.html