One problem investigation SPDK

phenomenon

SPDK run the program, the following error:


starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status

It causes the program can not be executed. What causes it?

Analysis process

NVME Hardware queue of contraindications

Reference NVME protocol, you can see the hardware queue is a submittion queue and
completion queue consisting of two complexes in order to process IO requests:

One problem investigation SPDK

Protocol described with reference to:

When host software builds a command for the controller to execute, it first checks to make sure that the appropriate Submission Queue (SQx) is not full. The Submission Queue is full when the number of entries in the queue is one less than the queue size. Once an empty slot (pFreeSlot) is available:
1. Host software builds a command at SQx[pFreeSlot] with:
a. CDW0.OPC is set to the appropriate command to be executed by the controller;
b. CDW0.FUSE is set to the appropriate value, depending on whether the command is a
fused operation;
c. CDW0.CID is set to a unique identifier for the command when combined with the
Submission Queue identifier;
d. The Namespace Identifier, CDW1.NSID, is set to the namespace the command applies to;
e. MPTR shall be filled in with the offset to the beginning of the Metadata Region, if there is a data transfer and the namespace format contains metadata as a separate buffer;
f. PRP1 and/or PRP2 (or SGL Entry 1 if SGLs are used) are set to the source/destination of data transfer, if there is a data transfer; and
g. CDW10 – CDW15 are set to any command specific information;
and
2. Host software writes the corresponding Submission Queue doorbell register (SQxTDBL)
to submit one or more commands for processing.
The write to the Submission Queue doorbell register triggers the controller to consume one or more new commands contained in the Submission Queue entry. The controller indicates the most recent SQ entry that has been consumed as part of reporting completions. Host software may use this information to determine when SQ slots may be re-used for new commands.

The above can be seen from the NVME 3,4,5,6 steps are completed controller hardware, software and the host side by 1/2 7/8 completed, which have strictly limited sequence of 1, 2, 7 / 8 there are strict limitations of the order.

SPDK default way tied to nuclear

Based on the above processing flow, SPDK package provides an API 1,2,7,8 step of the above, as a function. If multiple threads call the above API to control the same set of hard ware queue, it could lead to break limited by the order of the above. Therefore, during initialization, SPDK thread by default, binds to a processor core up.

@@ -448,7 +448,7 @@ int init(const char * dev_name) {
     spdk_env_opts_init(&opts);
     opts.name = "append_demo";
     opts.shm_id = 0;
     opts.core_mask = "0x8";
     if (spdk_env_init(&opts) < 0) {
         fprintf(stderr, "Unable to initialize Spdk env\n");
         return -1;

SPDK Threads Considerations

Through the above analysis it can be seen: a set of HW queue pair can not be used simultaneously for a plurality of threads, but different hard ware queue are different threads simultaneously.

Validation results

Based on the analysis above, modify the program, all of a sudden there is no error.

Guess you like

Origin blog.51cto.com/xiamachao/2425054