OpenCL resource management

在使用opencl 在n卡上面执行程序在分配了大概不到40个buffer一共加起来1.5g左右的buffer
在初始化执行多次释放这个顺序大概几次以后会在clEnqueueNDRangeKernel 里面出现 -4

即 buffer创建失败的错误CL_MEM_OBJECT_ALLOCATION_FAILURE

下面附一个搜到的一个类似的情况

使用 cuda 里面的插件 nvml 来查看显存的占用

终于找到原来有部分显存没有释放全部释放后前后显存增加几乎可以忽略

可以出现此类情况多半是没有释放全显存

It is common sense that for a long-running system, it’s of utmost importance that no piece of code is leaking memory or other resources. Otherwise, the system will crash sooner or later due to memory exhaustion. This is what happened with a distributed system I developed for high performance image processing.

In this system, each network node – each having at least one GPU – runs a small piece of server software that prepares an image processing graph and processes incoming data. When the data stream has been processed, the server releases almost all resources and waits for the next data processing task. To be sure, that everything was working as expected, I ran a stress test during the night. Of course, at some point the server crashed with a segmentation fault. Using nvidia-smi, I found out that the GPU has allocated more and more memory and faced an out-of-memory situation at some point.

OpenCL has a straightforward system of reference counted resources. You create a context, command queue, buffer, program or kernel with its corresponding clCreateFoo() function. If several independent software parts reference an OpenCL resource for internal use, the reference count can be increased with clRetainFoo(). When a resource is no longer of any use, it can be discarded with clReleaseFoo(). Once the reference count reaches zero, any associated resources such as memory are freed. At least, that is what I expected.

However on NVIDIA systems, a final clReleaseMemObject() will not free the memory segment in GPU memory, if not every other OpenCL object has been freed too. Consider this small snippet:

n_elements = 1024 * 1024;
mem = clCreateBuffer (context, CL_MEM_READ_WRITE,
                      n_elements * sizeof (float),
                      NULL, &errcode);

/* Launch kernel with one parameter */
clSetKernelArg (kernel, 0, sizeof (cl_mem), &mem);
clEnqueueNDRangeKernel (cmd_queue, kernel,
                        1, NULL, &n_elements, NULL,
                        0, NULL, &event);

/* Wait for end of execution and release all resources */
clWaitForEvents (1, &event));
clReleaseMemObject (mem);
clReleaseKernel (kernel);
clReleaseProgram (program);
clReleaseCommandQueue (cmd_queue);
clReleaseContext (context);

This looks innocent, however repeating this over and over again will give you aCL_MEM_OBJECT_ALLOCATION_FAILURE eventually because we did not release the event object and thus did not clean up GPU memory.

So, if you are developing with NVIDIA’s OpenCL implementation and what to ensure a stable system:

Check that each call to clCreateFoo() and clRetainFoo() is accompanied by a clReleaseFoo() call.
Beware of cl_event objects that are created implicitly by calls to the clEnqueueFoo() function family.
Do not assume that clReleaseContext() will release resources that are associated with the context.

OpenCL resource management

猜你喜欢