Interpretation of embedded OS design strategy | Implementation method of high reliability and high performance of Yilian enterprise-class SSD

Enterprise-level SSDs need to maintain ultra-high stability when reading and writing large amounts of enterprise data, and operate 24 hours a day. The read-write speed, service life, stability, and reliability of SSDs are the focus of enterprise-level users. In order to meet users' needs for enterprise-level SSDs with high performance, low latency, lightweight, and high reliability, Yilian designed and developed an embedded operating system (Operating System, abbreviation: OS) suitable for SSDs, and built on it Developed a highly replicable SSD controller software system.

Embedded software is divided into three levels, namely driver layer, OS layer and business layer. The business layer integrates large and complex product functions and is responsible for processing business logic, usually including interface protocols, business function implementation, system data storage, etc.; the driver layer abstracts the hardware access of the device into a software interface to serve the OS layer and business layer ; The OS layer provides software platform services to the business layer, allowing the business layer to focus on realizing large and complex software functions.

picture

figure 1

General OS has many functions, including processor management, memory management, device management, file management, job management, etc. However, for embedded OS, it mainly focuses on core functions such as processor management and memory management. This article focuses on elaborating on them. Processor management, mutual exclusion, communication.

Processor architecture and OS deployment of SSD systems

In PCIe 4.0, the read and write bandwidth of SSD reaches 8GB/s, and in PCIe 5.0, the read and write bandwidth of SSD may reach 16GB/s. In order to achieve high performance of SSD, SSD controllers usually use multi-CPU or even multi-Cluster processor architectures for SSD business computing and Nand flash operations respectively.

picture

figure 2

In a multi-processor, multi-Cluster architecture, you can choose SMP mode, AMP mode, or SMP and AMP mixed mode when deploying services. In SSD systems, SMP mode and AMP mode are basically mixed, and the OS is responsible for program scheduling, mutual exclusion, and communication functions. AMP mode communication in embedded systems can be compared to communication between multiple processes in a general system. Because there is no global data available, external storage space is required for communication.

Scheduling strategy of Yilian OS

Conventional OS scheduling objects are all threads, and each thread has its own stack and priority, and has a preemption mechanism. When Yilian was designing the SSD embedded system, it was found through analysis that thread scheduling overhead was high, there were complex mutual exclusion issues between threads, priority reversal deadlocks were prone to occur, and coupled with the randomness and disorder of thread running, Keep the system in an "uncertain" state at all times. Therefore, Yilian SSD chose its own "function code (entry function and all functions it calls)" scheduling strategy. For the convenience of description, this article calls these "functional codes (entry functions and all functions they call)" that can be scheduled by the OS "transactions" in the SSD system.

Programs in the SSD system are no longer organized in threads, but in independent "transaction processing processes" that do not block. These "transaction flows" can be scheduled by the OS in various ways.

  •  Main characteristics of transactions:

(1) Will not block. A transaction completes a specific calculation. During the calculation process, it does not need to wait for peripheral actions and will run until the end. If a process will be blocked, it needs to be split into multiple independent transactions at the blocking point. Once a transaction is executed by the CPU, it will be executed until the end of the transaction. In the system, except for the transaction being executed by each CPU, which is in the running state, other transactions are in the end state. Compared with the thread's blocking state, the transaction state becomes extremely simple.

(2) Transaction sharing stack. When the transaction is completed, no more local variables need to be saved on the stack. Transactions share the stack, the stack requirements are small, and there is no overhead of stack switching during scheduling.

(3) On the same CPU, transactions are executed serially. A transaction in a CPU must complete one transaction before it can start executing the next transaction. The same transaction will not re-enter on the same CPU.

(4) Mutual exclusion between transactions becomes simple in SMP mode. See Figure 3.

picture

image 3

Remark:

(1) When the transaction is completed, the access lock of the global variable will not be held.

(2) There is no global shared data and no mutual exclusion between AMPs.

  • How transactions are scheduled

picture

Figure 4

  • Transaction priority and real-time

One-time transactions have transaction priority capabilities. The OS accepts one-time transaction scheduling requests and handles some urgent transactions in the system with high priority to achieve real-time processing of specific events.

picture

Figure 5

  • Interrupts and Transactions

As mentioned before, in Yilian OS, there will be no preemption between transactions in the same CPU. Only when one transaction is completed, another transaction will be run, but interrupts still have the preemption function.

The interrupt handler follows the regular interrupt handling process. When an interrupt arrives, it will still preempt the current transaction. When an interrupt arrives, the OS will push the currently running transaction program onto the stack and jump to the interrupt handler to run. After the interrupt handler is completed, the interrupted transaction program will resume running.

The interrupt handler runtime also uses the stack of the transaction program runtime.

The preemption function of the interrupt handler can satisfy the scenarios with high real-time requirements in the system, but it also brings about the mutual exclusion problem between interrupts and transactions. The mutually exclusive mode of interrupts and transactions is shown in Figure 6.

picture

Figure 6

When the interrupt handler is implemented, it can be divided into the upper half and the lower half of the interrupt. The upper half reads peripheral data, and the lower half sends transaction messages to the OS for scheduling and processing. The mutual exclusion of the lower half of the interrupt becomes a simple mutual exclusion method between transactions. To interrupt the transaction scheduling message sent to the OS, you can choose to use "urgent, high, or regular" priorities as needed.

  • Interrupt processing example: IPC interrupt

Generally speaking, IPC interrupts are used for communication between AMP and CPU.

After the sender writes the data to the shared DDR, it sends an IPC interrupt to the recipient.

The receiver's IPC interrupt program reads the communication data in the DDR, and then sends the transaction message to its own OS. The OS schedules and processes the transaction message according to the transaction.

The interrupt handler reads the communication data in the DDR, which is the upper part of the interrupt. Reading the communication data and doing specific processing is the completion of the transaction, which is the second half of the interruption.

Summarize

Through the idea of ​​transaction scheduling, Yilian has achieved the high performance and high reliability of its enterprise-level SSD products. The design, implementation, and expansion of business functions have become simpler and more flexible, and the code reuse of business functions has also become easier. This allows Yilian to well inherit the functions and quality of existing generation SSD products when developing new generation SSDs, ensuring that Yilian can continue to provide customers with high-quality SSD products and storage solutions.

Guess you like

Origin blog.csdn.net/UnionMemory/article/details/132077929