OpenCL Programming Guide-9.1 Commands, Queues, Events

overview

Command queues are at the heart of OpenCL. A platform defines a context, which contains one or more computing devices. Each computing device can have one or more command queues. Commands submitted to these queues will do the specific work of the OpenCL program.

In a simple OpenCL program, commands submitted to a command queue are executed sequentially. After a command is completed, the next command can start, and the program will be expanded into a sequence of commands in strict order. This sequential approach provides the performance required by applications when there is a large amount of concurrency among individual commands.

However, real applications are often not that simple. In most cases, applications do not need to execute commands in a strictly orderly manner. Memory objects can be moved between the device and the host while executing other commands. Commands that process unrelated memory objects can execute concurrently. In typical applications, commands running concurrently provide sufficient concurrency. This concurrency can be exploited by the runtime system to increase achievable parallelism, leading to significant performance gains.

There is also a common situation where the dependencies between commands can be expressed as a Directed Acyclic Graph (DAG). Such graphs may contain independent branches that can be run safely and concurrently. Requiring these commands to run in serial order would place unnecessary constraints on the system. An unordered command queue allows the system to take full advantage of the concurrency among these commands, but there is more concurrency that can be exploited. A lot of additional concurrency can be exploited by running independent branches of the DAG on different command queues that may be associated with different computing devices.

These examples have a common feature, the application has many opportunities to achieve concurrency, which cannot be satisfied by the command queue alone. By relaxing these ordering constraints, there may be significant benefits in terms of performance. However, these benefits come at a price. If the sequential semantics of command queues are not used to ensure safe command execution ordering, this is the responsibility of the programmer. In OpenCL, this task can be done using events.

Events are objects in OpenCL that communicate command status. Commands in the command queue generate events that other commands may wait for before executing. Users can create custom events to provide an additional layer of control between the host and computing devices. The event mechanism can be used to control the interaction between OpenCL and graphics standards such as OpenGL. Finally, in the kernel, programmers use events to allow the movement of data to overlap with operations on that data.

Event and Command Queues

OpenCL events are objects in OpenCL that convey information about commands. The status of the event describes the status of the associated command. The following status values ​​can be taken.

CL_QUEUED: 命令已经在命令队列中排队。
CL_SUBMITTED: 入队的命令由宿主机提交给与命令队列关联的设备。
CL_RUNNING: 计算设备正在执行命令。
CL_COMPLETE: 命令已经完成。
ERROR_CODE: 负值指示遇到某种错误条件。具体的值为平台或生成该事件的运行时API返回的值。

There are many ways to create events. The most common source of events is the command itself. Any command queued in the command queue generates or waits for an event. Different commands appear in the same way in the API, so we can use an example to explain how events work. Consider a command to enqueue kernels, ready to be executed on a computing device:

cl_int clEnqueueNDRangeKernel (
    cl_command_queue command_queue,
    cl_kernel kernel,
    cl_uint work_dim,
    const size_t *global_work_offset,
    const size_t *global_work_size,
    const size_t *local_work_size,
    cl_uint num_events_in_wait_list,
    const cl_event *event_wait_list,
    cl_event *event)

For now, we are only interested in the last three parameters of this function

cl_uint num_events_in_wait_list: 这个命令在执行之前需要等待完成的事件数。

const cl_event * event_wait_list: 这是一个指针数组,定义了这个命令等待的num_events_in_wait_list个事件。
                                  与event_wait_list中的事件和与command_queue关联的上下文必须相同。

cl_event * event: 这是一个指针,指向这个命令生成的一个事件对象。
                  可以由后续的命令或宿主机用来延续这个命令的状态。

When legal values ​​are provided for the parameters num_events_in_wait_list and *event_wait_list, the command will only run if all events in the list have the CL_COMPLETE status or have a negative value indicating some error condition.

Events are used to define a sequence point where two commands enter a known state of the program, and thus can be used as a synchronization point in OpenCL. Like all synchronization points in OpenCL, memory objects enter a well-defined state when executing multiple kernels according to the OpenCL memory model. Memory objects are associated with a context, so consistent state is guaranteed even if computations involve multiple command queues in a context.

For example, consider the following simple example:

cl_event k_events[2];

//enqueue two kernels exposing events
err = clEnqueueNDRangeKernel(commands, kernel1, 1, NULL, &global, &lobal, 0, NULL, &k_events[0]);

err = clEnqueueNDRangeKernel(commands, kernel2, 1, NULL, &global, &lobal, 0, NULL, &k_events[1]);

//enqueue the next kernel..which waits for two prior
//events before launching the kernel
err = clEnqueueNDRangeKernel(commands, kernel3, 1, NULL, &global, &local, 2, &k_events, NULL);

Here 3 cores are queued for execution. The first two clEnqueueNDRangeKernel commands enqueue kernel1 and kernel2. The last parameter of these commands will generate the event, which will be placed in the corresponding element of the array k_events[]. The third clEnqueueNDRangeKernel command enqueues kernel3. As shown by the seventh and eighth parameters of clEnqueueNDRangeKernel, kernel3 will not run until the two events in the array k_events[ ] are completed. However, it should be noted that the last parameter to enqueue kernel3 is NULL. This means we don't want to generate an event for subsequent commands to access.

Events are crucial if you need detailed control over the order in which commands are executed. However, when such control is not required, it can be convenient to have commands ignore events (both their consumption and generation). A command can be told to ignore events using the following procedure:
1) Set the number of events the command is waiting for (num_events_in_wait_list) to 0.
2) Set the pointer to the event array (*event_wait_list) to NULL. It should be noted that if this pointer is set to NULL, num_events_in_wait_list must be 0.
3) Set the pointer to the generated event (*event) to NULL.
This process ensures that no events are waited on, and no events are generated, which of course means that for this particular instance of the kernel execution, it is impossible for an application to query for events or wait for events to be enqueued.

When enqueuing commands, it is often desirable to indicate a synchronization point before which all commands must complete before subsequent commands can begin. Such a synchronization point can be indicated on the commands in the queue using the clBarrier() function.

cl_int clEnqueueBarrier(cl_command_queue command_queue)

This function has only one parameter, which defines which queue the fence applies to. If the function executes successfully, the command returns CL_SUCCESS; otherwise, returns one of the following error conditions.

CL_INVALID_COMMMAND_QUEUE: 命令队列不是一个合法的命令队列。
CL_OUT_OF_RESOURCES: 在设备上分配OpenCL实现所需要的资源时失败。
CL_OUT_OF_HOST_MEMORY: 在宿主机上分配OpenCL实现所需要的资源时失败。

The clEnqueueBarrier command defines a synchronization point. This is important for understanding order constraints between commands. But more importantly, in the OpenCL memory model presented, the consistency of memory objects is defined with respect to synchronization points. Specifically, at a synchronization point, updates to memory objects visible to commands must complete before subsequent commands can see the new values.

To define more general synchronization points, OpenCL uses events and flags. Flags are set with the following commands.

cl_int clEnqueueMarker(
    cl_command_queue command_queue,
    cl_event *event)

cl_command_queue command_queue: 应用这个标志的命令队列
cl_event *event: 这个指针指向用来传递标志状态的事件对象

Only after all commands have been enqueued can the flag command complete. For an ordered queue, the effect of the clEnqueueMarker command is similar to a fence. But unlike fences, flag commands return an event. The host or other commands can wait for this event to ensure that all commands are enqueued before the flag command completes. If the function executes successfully, clEnqueueMarker returns CL_SUCCESS; otherwise, returns one of the following errors.

CL_INVALID_COMMAND_QUEUE: command_queue不是一个合法的命令队列。
CL_INVALID_VALUE: 事件是一个NULL值。
CL_OUT_OF_RESOURCES: 在设备上分配OpenCL实现所需要的资源时失败。
CL_OUT_OF_HOST_MEMORY: 在宿主机上分配OpenCL实现所需要的资源时失败。

The following function enqueues an event, which waits for a specific event or -group of events to complete before executing future enqueued commands.

cl_int clEnqueueWaitForEvents(
   cl_command_queue command_queue,
   cl_uint num_events,
   const cl_event *event_list)

cl_command_queue command_queue: 应用这个事件的命令队列
cl_uint num_events_in_wait_list: 这个命令等待完成的事件数
const cl_event *event_wait_list: 这是一个指针数组,定义了这个命令等待的num_events_in_wait_list个事件

These events define synchronization points. This means that by the time clEnqueueWaitForEvents completes, updates to memory objects as defined in the memory model must have completed, and subsequent commands can depend on a consistent state of the memory objects. The events in event_list and the context associated with command_queue must be the same.

clEnqueuewaitForEvents returns CL_SUCCESS if the function executes successfully; otherwise, returns one of the following errors.

CL_INVALID_COMMMAND_QUEUE: command_queue不是一个合法的命令队列。
CL_INVALID_CONTEXT: 与command_queue和与event_list 中的事件关联的上下文不相同。
CL_INVALID_VALUE: num_events为0或event_list为NULL。
CL_INVALID_EVENT: event_list中指定的事件对象不是合法的事件。CL_OUT_OF_RESOURCES: 在设备上分配OpenCL实现所需要的资源时失败。CL_OUT_OF_HOST_MEMORY: 分配命令所需要的资源时失败。

These three commands clEnqueueBarrier, clEnqueueMarker, and clEnqueueWaitForEvents impose order constraints on the commands and synchronization points in the queue, which affects the consistency of OpenCL memory. Together they provide the fundamental building blocks for synchronous protocols in OpenCL.

For example, consider two queues sharing the same context, but directing commands to different computing devices. Memory objects can be shared between the two devices (since they share the same context), but due to OpenCL's relaxed coherent memory model, at a given point, shared memory objects may be relative to a queue (or another A queue) is in an undefined state. Placing a fence at a strategic point can solve this problem, and the programmer may use the clEnqueueBarrier() command to solve this problem, as shown in Figure 9-1.
insert image description here
However, the fence command in OpenCL only imposes constraints on the command queue where the fence is located, that is, the order of its commands. How does a programmer define a fence that can span two command queues? As shown in Figure 9-2.
insert image description here
In one of the queues, the clEnqueueMarker() command is enqueued, returning a valid event object. A flag acts like a fence in its own queue, but also returns an event that other commands can wait for. In the second queue, we put a fence at the desired location and add a clEnqueueWaitForEvents call behind the fence. The clEnqueueBarrier command will cause the corresponding queue to have the desired behavior, that is, all commands before clEnqueueBarrier() must be completed before subsequent commands can be executed. The clEnqueueWaitForEvents() call can define connections from other queues to flags. The end result is a synchronization protocol that can define fence functionality between two queues.

Guess you like

Origin blog.csdn.net/qq_36314864/article/details/132107633