OpenCL Programming Guide-8 Image and Collector

Image and Collector Objects

GPUs were originally designed to render 3D graphics with high performance. One of the most important features of the 3D graphics pipeline is the application of texture images to polygonal surfaces. As a result, GPUs have evolved to access and filter texture images with extremely high performance. Although most image operations can be emulated using the generic memory objects introduced in Chapter 7, those emulating methods tend to significantly reduce performance compared to using image objects. In addition, with image objects, operations such as texture edge clamping and filtering can be done very easily.

So, first of all, understand that the main reason for the existence of image objects in OpenCL is: this allows programs to take full advantage of the high-performance texture hardware in the GPU. In addition, you can get some other advantages from other hardware, so the image object is the best way to deal with 2D and 3D image data in OpenCL.

An image object encapsulates various pieces of information about an image:
1) Image size: the width and height of a 2D image (and the depth of a 3D image).
2) Image Format: The bit depth and layout of the image pixels in memory.
3) Memory access flags: For example, whether the image is used for reading or writing, or whether it can be read and written at the same time.
A sampler is required in the kernel to get data from an image object. A sampler tells the image reading function how to access the image.
4) Coordinate mode: Whether the texture coordinates used to get data from the image are normalized to the range [0...1] or the range [0...image_dim - 1].
5) Ground-finding mode: When the coordinates exceed the boundary range of the image, the behavior of obtaining data from the image.
6) Filtering mode: When acquiring data from an image, whether to take one sample or use multiple sample filtering (for example, bilinear filtering).

One thing about samplers that can be a bit confusing at first: there are two options for how to create a sampler. A sampler can be declared directly in the kernel code (using sampler_t), or created as a sampler object in a C/C++ program. The main reason for wanting to create the sampler as an object rather than statically declaring it in the code is to allow the kernel to use different filtering and addressing options.

Create an image object

Creating an image object can be done with clCreateImage2D() or clCreateImage3D():

cl_mem clCreateImage2D(cl_context context,
                       cl_mem_flags flags,
                       const cl_image_format * image_format,
                       size_t image_width,
                       size_t image_height,
                       size_t image_row _pitch,
                       void * host_pts,
                       cl_ int * errcode_ret)
                       
cl_mem clCreateImage3D(cl_context context,
                       cl_mem_flags flags,
                       const cl_image_format * image_format,
                       size_t image_width,
                       size_t image_height,
                       size_t image_depth,
                       size_t image_row_pitch,
                       size_t image_slice pitch,
                       void * host ptr,
                       cl_ int *errcode_ret)

/*
context 创建图像对象的上下文。
flags 这是一个位域，用来指定有关图像创建的分配和使用信息。
      flags可取的合法值由枚举el_mem_flags定义，见表7-1。

image_format 描述通道次序和图像通道数据的类型。
image_width 图像宽度(像素数)。
image_height 图像高度(像素数)。
image_depth (仅适用于3维图像）对于3维图像，指定图像的切片数。

image_row_pitch 如果host_ptr不为NULL，这个值指定图像中各行的字节数。
                如果值为0，则认为长度等于image_width *(bytes_per_pixel)。

image_slice_pitch (仅适用于3维图像）如果host_ptr不为NULL，这个值指定图像中各个切片的字节数。
                  如果值为0，则认为长度等于image_height * image_row_pitch。
                  
host_ptr 内存中线性布局图像缓冲区的指针。对于2维图像，缓冲区是扫描行的线性数组。
         对于3维图像，缓冲区则是2维图像切片的一个线性数组。
         每个2维切片与2维图像有相同的布局。
         
errcode_ret 如果为非NULL，函数返回的错误码由这个参数返回。
*/

Listing 8-1 for the imageFilter2D example shows how to use the FreeImage library to load an image from a file and then create a 2D image object from its contents. The image is first loaded from disk and then stored in a 32-bit RGBA buffer, where each channel is 1 byte (8 bits) each. Next, create the cl_image_format structure according to the channel order CL_RGBA and the channel data type CL_UNORM_INT8. Finally the image is created using clCreateImage2D(). This 32-bit image buffer is loaded into host_ptr and copied to the OpenCL device. mem_flags is set to CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, which copies data from the host pointer and stores it in a 2D image object, which is read-only in the kernel.

An important point to note is that clCreateImage2D() and clCreateImage3D() return a cl_mem object. There isn't any special object type for image objects, which means they must be released using standard memory object functions such as clReleaseMemObject().

// 代码清单8-1 从文件创建一个2维图像对象

cl_mem LoadImage(cl_context context, char* fileName, int& width, int& height)
{
    
    
	FREE_IMAGE_FORMAT format = FreeImage_GetFileType(fileName, 0);
	FIBITMAP* image = FreeImage_Load(format, fileName);

	// Convert to 32-bit image
	FIBITMAP* temp = image;
	image = FreeImage_ConvertTo32Bits(image);
	FreeImage_Unload(temp);

	width = FreeImage_GetWidth(image);
	height = FreeImage_GetHeight(image);

	char* buffer = new char[width * height * 4];
	memcpy(buffer, FreeImage_GetBits(image), width * height * 4);

	FreeImage_Unload(image);

	// Create OpenCL image
	cl_image_format clImageFormat;
	clImageFormat.image_channel_order = CL_RGBA;
	clImageFormat.image_channel_data_type = CL_UNORM_INT8;

	cl_int errNum;
	cl_mem clImage;
	clImage = clCreateImage2D(context,
		CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
		&clImageFormat,
		width,
		height,
		0,
		buffer,
		&errNum);

	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Error creating CL image object" << std::endl;
		return 0;
	}

	return clImage;
}

In addition to creating a 2D image object as input, the sample program also creates an output 2D image object, which will store the result of Gaussian filtering done on the input image. This output object is created using the code shown in Listing 8-2. It should be noted that host_ptr is not specified when creating this object, because it will be filled with data in the kernel. Also, mem_flags is set to CL_MEM_WRITE_ONLY, because images are write-only in the kernel, not read-only.

// 代码清单8-2 创建用于输出的2维图像对象

// Create ouput image object
cl_image_format clImageFormat;
clImageFormat.image_channel_order = CL_RGBA;
clImageFormat.image_channel_data_type = CL_UNORM_INT8;
imageObjects[1] = clCreateImage2D(context,
	CL_MEM_WRITE_ONLY,
	&clImageFormat,
	width,
	height,
	0,
	NULL,
	&errNum);

After creating an image object, you can use the general memory object function clGetMemobjectInfo() introduced in Chapter 7 to query object information. Additional information specific to an image object can also be queried using clGetImageInfo():

cl_int clCetImageInfo(cl_mem image,
                      cl_image_info param_name,
                      size_t param_value_size,
                      void * param_value,
                      size_t * param_value_size_ret)

/*
image 要查询的一个合法图像对象。
param_name 查询信息的参数，必须是以下参数之一。
CL_IMAGE_FORMAT(cl_image_format): 创建图像采用的格式。
CL_IMAGE_ELEMENT_SIZE(size_t): 图像中单个像素元素的大小(字节数)。
CL_IMAGE_ROW_PITCH(size_t): 图像中各行的字节数。

CL_IMAGE_SLICE_PITCH(size_t): 3维图像中各个2维切片的字节数;
                              对于2维图像，则为0。
                              
CL_IMAGE_WIDTH(size_t): 图像宽度(像素数)。
CL_IMAGE_HEIGHT(size_t): 图像高度(像素数)。
CL_IMAGE_DEPTH(size_t): 对于3维图像，这是图像的深度（像素数);
                        对于2维图像，则为0。

param_value_size: param_value中的字节数。
param_value 这个指针指向存储结果的位置。
            必须给这个位置分配足够的字节来存储请求的结果。param_value_size_ret 实际写入param_value的字节数。

*/

image format

As shown in Listing 8-1, the cl_image_format parameter passed to clCreateImage2D() and clCreateImage3D() specifies how the pixels of the image are laid out in memory. The cl_image_format structure specifies the channel order and bit representation in detail, and is defined as follows:

typedef struct _ci_image_format
{
    
    
     cl_channel_order image_channel_order;
     cl_channel_type image_channel_data_type;
}ci_image_format;

The valid values of image_channel_order and image_channel_data_type are shown in Table 8-1 and Table 8-2. In addition to providing a layout, indicating how the image bits are stored in memory, cl_image_format also determines how the results are interpreted when read in the kernel. The choice of channel data type affects the applicable OpenCL C functions for reading/writing images (such as read_imagef, read_imagei or read_imageui). The last column in Table 8-1 shows how image channel order affects the interpretation of fetched results in the kernel.

表8-1 图像通道次序
 通道次序                         描述                                  内核中读取结果
CL_R、CL_Rx          将读入内核R分量的一个图像数据通道                   (R,0.0,0.0,1.0)
                     CL_Rx包含两个通道,但读入内核时仅第一个通道可用

CL_A                 将读入内核A分量的一个图像数据通道                   (0.0,0.0,0.0,A) 

CL_INTENSITY         将读入内核所有颜色分量的一个图像数据通道                (I,I,I,I)
                     这种格式只能用于通道数据类型CL_UNORM_INT8、CL_UNORM_INT16、
                     CL_SNORM_INT8、CL_SNORM_INT16、CL_HALF_FLOAT或CL_FLOAT

CL_RG、CL_RGx        将读入内核R和G分量的两个图像数据通道                 (R,G,0.0,1.0)
                     CL_RGx包含3个通道,不过会忽略数据的第三个通道

CL_RA                将读入内核R和A分量的两个图像数据通道                  (R,0.0,0.0,A)

CL_RGB、             将读入内核R、G和B分量的三个图像数据通道                (R,G,B,1.0)
CL_RGBx              这些格式只能用于通道数据类型CL_UNORM_SHORT_565、
                     CL_UNORM_SHORT_555或CL_UNORM_INT_101010

CL_RGBA、            将读入内核R、G、B和A分量的四个图像数据通道              (R,G,B,A)
CL_BGRA、            CL_BGRA和CL_ARGB只能用于通道数据类型CL_UNORM_INT8、
CL_ARGB              CL_SNORM_INT8、CL_SIGNED_INT8或CL_UNSIGNED_INT8

CL_LUMINANCE         将复制到内核中所有4个分量的一个图像数据通道             (L,L,L,1.0)
                     这个格式只能用于通道数据类型CL_UNORM_INT8、
                     CL_UNORM_INT16、CL_SNORM_INT8、CL_SNORM_INT16、
                     CL_HALF_FLOAT或CL_FLOAT

表8-2 图像通道数据类型
通道数据类型                         描述
CL_SNORM_INT8           各个8位整数值将映射到范围[-1.0, 1.0]
CL_SNORA_INT16          各个16位整数值将射到范围[-1.0, 1.0]
CL_UNORM_INT8           各个8位整数值将映射到范围[0.0, 1.0]
CL_UNORM_INT16          各个16位整数值将映射到范围[0.0, 1.0]
CL_SIGNED_INT8          各个8位整数值将读至螫数范围[-128, 127]
CL_SIGNED_INT16         各个16位整数值将读至整数范围[-32768, 32767]
CLSIGNED_INT32          各个32位整数值将读至螫数范围[-2147483648, 2147483647]
CL_UNSIGNED_INT8        各个8位无符号整数值将读至无符号整致范围[0, 255]
CL_UNSIGNED_INT16       各个16位无符号整数值将读至无符号整数范围[0, 65535]
CL_UNSIGNED_INT32       各个32位无符号整数值将读至无符号整数范围[0, 4294967295]
CL_HALF_FLOAT           各个16位分量将处理为一个半浮点值
CL_FLOAT                各个32位分量将处理为一个单精度浮点值
CL_UNORM_SHORT_565      一个5:6:5的16位值，其中各个分量(R,G,B)将规格化至范围[0.0, 1.0]
CL_UNORM_SHORT_555      一个x:5:5:5的16位值，其中各个分量(R,G,B)将规格化至范围[0.0, 1.0]
CL_UNORM_INT_101010     一个x:10:10:10的32位值，其中各个分量(R,G,B)将规格化至范围[0.0, 1.0]

All image formats in Tables 8-1 and 8-2 may be supported by an OpenCL implementation, but support for only a subset of these formats is mandatory. Table 8-3 shows the image formats that all OpenCL implementations must support (if images are to be supported). An implementation may not support images at all, OpenCL devices can be queried for the boolean setting CL_DEVICE_IMAGE_SUPPORT using clGetDeviceInfo(). If images are supported, the formats in Table 8-3 can be used directly without querying OpenCL which formats are available.

表8-3 强制支持的图像格式
通道次序                        通道数据类型
CL_RGBA                        CL_UNORM_INT8
                               CL_UNORM_INT16
                               CL_SIGNED_INT8
                               CL_SIGNED_INT16
                               CL_SIGNED_INT32
                               CL_UNSIGNED_INT8
                               CL_UNSIGNED_INT16
                               CL_UNSIGNED_INT32
                               CL_FLOAT

CL_BGRA                        CL_UNORM_INT8

If an image format not listed in Table 8-3 is used, OpenCL must be queried using clGetsupportedImageFormats() to determine whether the desired image format is supported:

cl_int clGetSupportedTmagePormats(cl_context context,
                                  cl_mem_flags flags,
                                  cl_mem_object_type image_type,
                                  cl_uint num_entries,
                                  cl_image_format * image_formats,
                                  cl_uint * num_image_formats)
/*
context 查询所支持图像格式的上下文。
flags 这是一个位域，用来指定有关图像创建的分配和使用信息。
      flags可取的合法值由枚举cl_mem_flags定义，见表7-1。
      要将这个标志设置为创建图像时计划使用的标志。
      
image_type 图像的类型必须是CL_MEM_OBJECT_IMAGE2D或CL_MEM_OBJECT_IMAGE3D。num_entries 可以返回的项数。

image_formats 这个指针指向的位置将存储所支持图像格式的一个列表。
              可以将它设置为NULL，查询所支持的图像格式数目。
              
num_image_formats 这个指针指向一个cl_uint，其中将存储图像格式数目。
*/

Query image support

The ImageFilter2D example uses only one mandatory format, so it only checks for supported images, as shown in Listing 8-3. If the program uses a non-mandatory supported format, then you also need to call clGetSupportedImageFormats () to ensure that this image format is supported.

代码清单8-3 查询设备的图像支持
// Make sure the device supports images, otherwise exit
cl_bool imageSupport = CL_FALSE;
clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool),
	&imageSupport, NULL);
if (imageSupport != CL_TRUE)
{
    
    
	std::cerr << "OpenCL device does not support images." << std::endl;
	Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
	return 1;
}

Create a sampler object

So far we have covered how the ImageFilter2D example creates image objects for input and output images. Now the kernel is ready to be executed. But there is another object that needs to be created: the sampler object. A sampler object specifies the filtering, addressing, and coordinate modes to use when getting data from an image. All of these options correspond to the capabilities of the GPU hardware to fetch textures respectively.

The filter mode specifies whether to use nearest (nearest) sampling or linear (linear) sampling to obtain data. For nearest sample, the value is read from the location in the image that is closest to the coordinates. For linear sampling, multiple values close to the coordinate will be averaged. For 2D images, a linear filter takes the 4 closest samples and averages them, which is called bilinear sampling. For 3D images, a linear filter takes 4 samples from each closest slice and then performs linear interpolation between these averages. This is called trilinear sampling . The overhead of filtering varies with GPU hardware, but is generally considered to be quite efficient, much more efficient than manual filtering.

The coordinate mode specifies whether the coordinates used when reading data from the image are normalized coordinates (float values in the range [0.0, 1.0]) or denormalized coordinates (integer values in the range [0, image_dimension - 1]). Using normalized coordinates means that the coordinate values do not take image size into account. Using unnormalized coordinates means that the coordinates are within the size of the image.

The addressing mode specifies what to do when coordinates fall outside the range [0.0, 1.0] (for normalized coordinates) or [0, dimension -1] (for denormalized coordinates). These modes are given in the description of clCreateSampler():

cl_sampler clCreatesampler(cl_context context,
                           cl_bool normalized_coords,
                           cl_addressing_mode addressing_mode,
                           cl_filter_mode filter_mode,
                           cl_ int * errcode_ret)

/*
context 创建采样器对象的上下文。
normalized_coords 坐标是规格化浮点值还是图像大小范围内的整数值。

addressing_mode 寻址模式指定了使用超出图像范围的一个坐标获取图像时会发生什么。

                CL_ADDRESS_CLAMP: 超出图像范围的坐标会返回边界颜色。
                                  对于CL_A、CL_INTENSITY、CL_Rx、CI_RA、CL_RGx、CL_RGBx、CL_ARGB、CL_BGRA和CL_RGBA，这个颜色将是（0.0,0.0,0.0,0.0)。
                                  对于CL_R、CL_RG、CL_RGB和CL_LUM工NANCE，这个颜色将是(0.0,0.0,0.0,1.0)。
                  
                CL_ADDRESS_CLAMP_TO_EDGE: 坐标将钳制至图像边缘。
                CL_ADDRESS_REPEAT: 超出图像范围的坐标会重复。
                CL_ADDRESS_4IRRORED_REPEAT:超出图像范围的坐标会镜像并重复。

fiiter_mode 过滤模式指定如何对图像采样。
            CL_FILTER_NEAREST: 取最接近坐标的样本。
            CL_FILTER_LINEAR: 取最接近坐标的样本的平均值。
                              对于2维图像，这会完成双线性过滤;
                              对于3维图像,这会完成三线性过滤。

errcode_ret 如果为非NULL，函数返回的错误码由这个参数返回。

*/

In the ImageFilter2D example, a sampler is created that performs nearest sampling and clamps the coordinates to the edge of the image, as shown in Listing 8-4. The coordinates are specified as denormalized, which means that the x-coordinate will be an integer in the range [0, width - 1] and the y-coordinate will be an integer in the range [0, height - 1].

代码清单8-4 创建一个采样器对象
// Create sampler for sampling image object
sampler = clCreateSampler(context,
	CL_FALSE, // Non-normalized coordinates
	CL_ADDRESS_CLAMP_TO_EDGE,
	CL_FILTER_NEAREST,
	&errNum);

if (errNum != CL_SUCCESS)
{
    
    
	std::cerr << "Error creating CL sampler object." << std::endl;
	Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
	return 1;
}

There is no requirement to create sampler objects in a C program. For the ImageFilter2D example, the sampler object created in Listing 8-4 is passed as an argument to the kernel function. The advantage of creating sampler objects this way is that properties of the sampler object can be modified without modifying the kernel. However, it is also possible to create a sampler directly in the kernel code. For example, this sampler could be created in kernel code and behave the same:

const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
                          CLK_ADDRESS_CLAMP_TO_EDGE |
                          CLK_FILTER_NEAREST;

Whether you need the flexibility of sampler objects created with clCreatesampler(), or just sampler objects declared directly in the kernel, is entirely up to you. For the ImageFilter2D example, there is really no need to create the sampler from outside the kernel. Actually, this is done just to show you that it can be done. Instead, doing so does provide more flexibility in general.

When the application is done using a sampler object, it can be released using clReleasesampler():

cl_int clReleaseSampler(cl_sampler sampler)
//sampler 要释放的采样器对象

Alternatively, a sampler object's settings can be queried using clGetSamplerInfo():

cl_int clCetsamplerInfo(cl_sampler sampler,
                        cl_sampler_info param name,
                        size_t param_value_size,
                        void * param_value,
                        size_t * param_value_size_ret)

/*
sampler 查询信息的一个合法的采样器对象。
param_name 要查询的参数,必须是以下参数之一:
           CL_SAMPLER_REFERENCE_COUNT(cl_uint): 采样器对象的引用计数。 
           CL_SAMPLER_CONTEXT(cl_context): 采样器关联的上下文。
           CL_SAMPLER_NORMALIZED_COORDS(cl_bool): 规格化坐标还是非规格化坐标。
           CL_SAMPLER_ADDRESSING_MODE (cl_addressing_mode): 采样器的寻址模式。
           CL_SAMPLER_FILTER_MODE(cl_filter_mode): 采样器的过滤模式。

param_value_size param_value 指示的内存大小(字节数)。

param_value 这个指针指向存储结果的位置。
            必须给这个位置分配足够的字节来存储请求的结果。
            
param_value_size_ret 实际写至param_value的字节数。

*/

OpenCL C functions for manipulating images

We have explained how the ImageFilter2D example creates image objects and sampler objects. Now to explain the Gaussian filter kernel itself, shown in Listing 8-5. A Gaussian filter is a kernel, usually used to smooth or blur an image, this is done by reducing the high frequency noise in the image.

代码清单8-5 高斯过滤器内核

// Gaussian filter of image

__kernel void gaussian_filter(__read_only image2d_t srcImg,
                              __write_only image2d_t dstImg,
                              sampler_t sampler,
                              int width, int height)
{
    
    
    // Gaussian Kernel is:
    // 1  2  1
    // 2  4  2
    // 1  2  1
    float kernelWeights[9] = {
    
     1.0f, 2.0f, 1.0f,
                               2.0f, 4.0f, 2.0f,
                               1.0f, 2.0f, 1.0f };

    int2 startImageCoord = (int2) (get_global_id(0) - 1, get_global_id(1) - 1);
    int2 endImageCoord   = (int2) (get_global_id(0) + 1, get_global_id(1) + 1);
    int2 outImageCoord = (int2) (get_global_id(0), get_global_id(1));

    if (outImageCoord.x < width && outImageCoord.y < height)
    {
    
    
        int weight = 0;
        float4 outColor = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
        for( int y = startImageCoord.y; y <= endImageCoord.y; y++)
        {
    
    
            for( int x = startImageCoord.x; x <= endImageCoord.x; x++)
            {
    
    
                outColor += (read_imagef(srcImg, sampler, (int2)(x, y)) * (kernelWeights[weight] / 16.0f));
                weight += 1;
            }
        }

        // Write the output value to image
        write_imagef(dstImg, outImageCoord, outColor);
    }
}

gaussian_kernel() has 5 parameters:

__read_only image2d_t srcImg: 要过滤的源图像对象。
__write_only image2d_t dstImg: 目标图像对象，过滤的结果将写入这个对象。sampler_t sampler: 采样器对象，指定read_imagef()所用的寻址、坐标和过滤模式。
int width、int height: 要过滤的图像的宽度和高度(像素数)。
                      注意，源图像对象和目标图像对象大小相同。

The ImageFilter2D program will set the kernel parameters, and the kernel will be queued for execution, as shown in Listing 8-6. Kernel arguments are first set by calling clsetKernelArg() for each argument. After setting the parameters, the kernel is queued for execution. The localworksize is set to a hard-coded value of 16 × 16 (it may need to be adjusted for the optimal size of the device, but it is only set to a hard-coded value here for the sake of illustration). The global work size rounds up the width and height to the nearest multiple of localworkSize. This is necessary because globalWorkSize must be a multiple of localworkSize. This setting allows the kernel to handle arbitrary image sizes (without requiring image width and height to be multiples of 16).

代码清单8-6 将高斯内核排队等待执行
// Set the kernel arguments
errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &imageObjects[0]);
errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &imageObjects[1]);
errNum |= clSetKernelArg(kernel, 2, sizeof(cl_sampler), &sampler);
errNum |= clSetKernelArg(kernel, 3, sizeof(cl_int), &width);
errNum |= clSetKernelArg(kernel, 4, sizeof(cl_int), &height);
if (errNum != CL_SUCCESS)
{
    
    
	std::cerr << "Error setting kernel arguments." << std::endl;
	Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
	return 1;
}

size_t localWorkSize[2] = {
    
     16, 16 };
size_t globalWorkSize[2] = {
    
     RoundUp(localWorkSize[0], width),
								 RoundUp(localWorkSize[1], height) };

// Queue the kernel up for execution
errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL,
	globalWorkSize, localWorkSize,
	0, NULL, NULL);
if (errNum != CL_SUCCESS)
{
    
    
	std::cerr << "Error queuing kernel for execution." << std::endl;
	Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
	return 1;
}

Looking again at the Gaussian filter kernel in Listing 8-5, the image coordinates are checked to see if they are within the width and height of the image. This is necessary due to rounding to the global working size. If we know that the image must be a multiple of a certain value, this test can be omitted, but this example is written to handle arbitrary image sizes, so we have done this test in the kernel to ensure that read and write operations do not exceed Image size range.

The main loop of gaussian_filter() (the nested for loop in Listing 8-5) reads nine values in a 3-by-3 region. Each value read from the image is multiplied by a weighting factor specified in the Gaussian convolution kernel. The result of this operation is to blur the input image. Use the OpenCL C function read_imagef() to read individual values from the image:

read_imagef(srcImg, sampler, (int2)(x, y));

The first argument is the image object, the second argument is the sampler, and the third argument is the image coordinates to use. In this case, the sampler is specified to take unnormalized coordinates, so the (x, y) values are integers in the range [0, width -1] and [0, height -1]. If the sampler uses normalized coordinates, the function call is the same, but the last parameter should be a float2 representing the normalized coordinates. The read_imagef() function returns a float4 color. The range of color values depends on which format the image is specified in. Here, our image is specified as CL_UNORM_INT8, so the returned color value will fall in the floating point range [0.0, 1.0]. Also, since the image channel order is specified as CL_RGBA, the returned color will be read into (R,G,B,A) in the result color.

Table 5-16 and Table 5-17 give all functions for reading 2D and 3D images. The choice of which image function to use depends on the channel data type specified for the image. Which function applies depending on the image format is specified earlier. The choice of coordinates (integer denormalized coordinates or floating point normalized coordinates) depends on the sampler settings used to call the read_image f|ui|i function.

At the end of Listing 8-5, the result of the Gaussian filter kernel is written to the target image:

write_imagef(dstImg, outImageCoora, outColor);

When writing to an image, the coordinates must be integers within the size of the image. For image write operations, there is no sampler, because there is no filtering and addressing mode (the coordinates must be in range), and the coordinates are always denormalized coordinates. The choice of which write_imagel f|ui|i to use also depends on the channel format chosen for the target image. All functions for writing 2D and 3D images are given in Table 5-21 and Table 5-22.

transfer image object

Until now, we have covered all other operations on image objects except moving them. OpenCL provides some functions (which can be placed in the command queue), which can complete the following transmission operations on the image:

clEnqueueReadImage() 从设备内存向宿主机内存读入图像。
clEnqueuewriteImage() 从宿主机内存向设备内存写入图像。
clEnqueueCopyImage() 将一个图像复制到另一个图像。
clEnqueueCopyImageToBuffer() 将一个图像对象（或它的一部分）复制到一个通用的内存缓冲区。
clEnqueueCopyBufferToImage() 将一个通用的内存缓冲区复制到一个图像对象(或它的一部分)。
clEnqueueMapImage() 将一个图像（或它的一部分）映射到一个宿主机内存指针。

Images can be enqueued using clEnqueueReadImage(), waiting to be read from the device into host memory.

cl_int clEnqueueReadImage(cl_comrmand_queue command_queue,
                          cl_mem image,
                          cl_bool blocking_read,
                          const size_t origin[3],
                          const size_t region[3],
                          size_t row_pitch,
                          size_t slice _pitch,
                          void* ptr,
                          cl_uint num_events_in_wait_list,
                          const cl_event * event_wait_list,
                          cl_event* event)

/*
command_queue 这是一个命令队列，读命令将在这个队列中排队。
image 这是将读取的一个合法的图像对象。

blocking_read 如果设置为CL_TRUE，则clEnqueueReadImage阻塞，直到数据读入ptr;
              否则，直接返回,用户必须查询event来检查命令的状态。

origin 要读取的相对于图像原点的（x，y，z）整数坐标。对于2维图像，z坐标为0。region 要读取的区域的（宽度，高度，深度)。对于2维图像，深度为1。
row_pitch 图像中各行的字节数。如果值为0，则认为长度为image_width * (bytes_per_pixel)。
slice_pitch 3维图像中各个切片的字节数。如果值为0，则认为长度为image_height * mage_row pitch。

ptr 这个指针指向写入所读数据的宿主机内存。

num_events_in_wait_list 数组event_wait_list中的项数。
                        如果event_wait_list为NULL，则这个参数必须为0;否则，必须大于0。

event_wait_list 如果为非NULL，则event_wait_list是一个事件数组，与必须完成的OpenCL命令关联。
                也就是说，在开始执行读命令之前，这些命令必须处于CL_COMPLETE状态。

event 如果为非NULL，函数返回的对应读命令的事件将由这个参数返回。

*/

In the ImageFilter2D example, a blocking read is specified when using clEnqueueReadImage() to read an image filtered with a Gaussian kernel back into a host memory buffer. Then use FreeImage to write this buffer to disk as an image file, as shown in Listing 8-7.

代码清单8-7 将图像读回宿主机内存
bool SaveImage(char* fileName, char* buffer, int width, int height)
{
    
    
	FREE_IMAGE_FORMAT format = FreeImage_GetFIFFromFilename(fileName);
	FIBITMAP* image = FreeImage_ConvertFromRawBits((BYTE*)buffer, width,
		height, width * 4, 32,
		0xFF000000, 0x00FF0000, 0x0000FF00);
	return (FreeImage_Save(format, image, fileName) == TRUE) ? true : false;
}

// Read the output buffer back to the Host
char* buffer = new char[width * height * 4];
size_t origin[3] = {
    
     0, 0, 0 };
size_t region[3] = {
    
     width, height, 1 };
errNum = clEnqueueReadImage(commandQueue, imageObjects[1], CL_TRUE,
	origin, region, 0, 0, buffer,
	0, NULL, NULL);
if (errNum != CL_SUCCESS)
{
    
    
	std::cerr << "Error reading result buffer." << std::endl;
	Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
	return 1;
}

You can also use clEnqueueWriteImage() to write an image from host memory to target memory.

cl_int clEnqueueriteImage(cl_comenand_queue comand_queue,
                          cl_mem image,
                          cl_bool blocking_write,
                          const size_t origin[3],
                          const size_t region[3],
                          size_t input_row_pitch,
                          size_t input_slice_pitch,
                          const void * ptr,
                          cl_uint num_events_in_wait_list,
                          const cl_event * event_wait_list,
                          cl_event * event)
/*
command_queue 这是一个命令队列，写命令将在这个队列中排队。
              image将写入的一个合法的图像对象。
              
blocking_write 如果设置为CL_TRUE，则 clEnqueueMriteImage 阻塞，直到从ptr写入数据;
               否则，直接返回,用户必须查询event来检查命令的状态。

origin 要写入的相对于图像原点的（x,y,z）螫数坐标。对于2维图像，z坐标为0。region 要写入的区域的（宽度，高度，深度)。对于2维图像，深度为1。
input_row_pitch 输入图像中各行的字节数。
input_slice pitch 输入3维图像中各个切片的字节数。对于2维图像，这个值为0。

ptr 这个指针指向宿主机内存中从哪里写内存。
    必须给这个指针分配足够的存储空间来存放区域指定的图像字节。

num_events_in_wait_list 数组event_wait_list中的项数。
                        如果event_wait_list为NULL，这个参数必须为0;否则,必须大于0。

event_wait_list 如果为非NULL，则event_wait_list是一个事件数组，与必须完成的OpenCL命令关联。
                也就是说，在开始执行读命令之前，这些命令必须处于CL_COMPLETE状态。

event 如果为非NULL，函数返回的对应读命令的事件将由这个参数返回。

*/

It is also possible to copy an image from one image object to another without using host memory. This is the quickest way to copy the contents of one image object to another. This copying can be done using the function clEnqueuecopyImage().

cl_int clEnqueuecopyImage(cl_command_queue command_queue,
                          cl_mem src_image,
                          cl_mem cst_image,
                          const size_t sre_origin[3],
                          const size_t dst_origin[3],
                          const size_t region[3],
                          cl_uint num_events_in_wait_list,
                          const cl_event * event_wrait_list,
                          cl_event *event)

/*
command_queue 这是一个命令队列，复制命令将在这个队列中排队。
src_image 要读取的一个合法的图像对象。
dst_image 要写入的一个合法的图像对象。
sre_origin 要读取的相对于源图像原点的（x, y,z)整数坐标。
           对于2维图像，z坐标为0。
dst_origin 要写入的相对于目标图像原点的（x, y，,z)整数坐标。
           对于2维图像，z坐标为0。
region 要读/写的区域的（宽度，高度，深度)。对于2维图像，深度为1。
num_events_in_wait_list 数组event_wait_list中的项数。
                        如果event_wait_list为 NULL，这个参数必须为0;否则,必须大于0。
event_wait_list 如果为非NULL，则event_wait_list是一个事件数组，与必须完成的OpenCL命令关联。
                也就是说，在开始执行读命令之前，这些命令必须处于CL_COPLETE状态。
event 如果为非NULE，函数返回的对应读命令的事件将由这个参数返回。

*/

Similarly, the reverse is also possible: copy a generic memory buffer to an image. Similar to allocating a host memory buffer to store images, the copied memory buffer area should have the same linear layout. Copying from buffer to image can be done using clEnqueuecopyBufferToImage().

cl_int clEnqueueCopyBufferToImage(cl_cammand_queue command_queue,
                                  cl_mem src_buffer,
                                  cl_mem dst_image,
                                  size_t src_offset,
                                  const size_t dst_origin[3],
                                  const size_t region[3],
                                  cl_ uint num_events_in_wait_list,
                                  const cl_event * event_wait_list,
                                  cl_event *event)
/*
command_queue 这是一个命令队列，从缓冲区复制到图像的命令在这个队列中排队。src_buffer 要读取的一个合法的缓冲区对象。
dst_image 要写入的一个合法的图像对象。
sre_offset 源内存缓冲区中读取的起始偏移量（字节数)。

dst_origin 要写入的相对于目标图像原点的（x, y，z)整数坐标。
           对于2维图像，z坐标为0。
region 要写入的区域的（宽度，高度，深度)。对于2维图像，深度为1。

num_events_in_wait_list 数组event_wait_1ist中的项数。
                        如果event_wait_list为NULL，这个参数必须为0;否则,必须大于0。

event_wait_list 如果为非NULL，则event_wait_list是一个事件数组，与必须完成的OpenCL命令关联。
                也就是说，在开始执行读命令之前，这些命令必须处于CL_COMPLETE状态。
                
event 如果为非NULL，函数返回的对应读命令的事件将由这个参数返回。

*/

Finally, there is a method to access the memory of an image object. Like regular buffers, image objects can be mapped directly into host memory. Mapping can be done using the function clEnqueueMapImage(). You can use the general buffer function clEnqueueUnmapMemobject() to unmap the image.

void * clEnqueueMapImage(cl_command_queue command_queue,
                         cl_mem image,
                         cl_bool blocking_map,
                         cl_map_flags map_flags,
                         const size_t origin[3],
                         const size_t region[3],
                         size_t * image_row pitch,
                         size_t * image_slice_pitch,
                         cl_uint num_events_in_wrait_list,
                         const cl_event * event_wait_list,
                         cl_event * event,
                         void *errcode_ret)
/*
comamand_queue 这是一个命令队列，读命令将在这个队列中排队。
image 一个合法的图像对象(数据将从中读取)。

blocking_map 如果设置为CL_TRUE，则clEnqueueMapImage阻塞，直到数据映射到宿主机内存;
             否则，直接返回,用户必缜查询event来检查命令的状态。

map_flags 这是一个位域，用来指示图像对象中(offset,cb)指定的区域如何映射。
          map_flags可取的合法值由枚举cl_map_flags定义，见表7-4。

origin 要读取的相对于图像原点的(x, y，z)整数坐标。对于2维图像，z坐标为0。region 要读取的区域的(宽度，高度，深度)。对于2维图像，深度为1。
image_row_pitch 如果不为NULL，则设置为所读图像的行长度。

image_slice_pitch 如果不为NULL，则设置为所读3维图像的切片长度。
                  对于2维图像，这个值设置为0。

num_events_in_wait_list 数组event_wait_list中的项数。
                        如果event_wait_list为 NULL，这个参数必须为0:否则，必须大于0。
                        
event_wait_list 如果为非NULL，则event_wait_list是一个事件数组，与必须完成的OpenCL命令关联。
                也就是说，在开始执行读命令之前，这些命令必须处于CL_COMPLETE状态。
                
event 如果为非NULL，函数返回的对应读命令的事件将由这个参数返回。
errcode_ret 如果为非NULL，函数返回的错误码由这个参数返回。
*/

The ImageFilter2D example in this chapter could be modified to use clEnqueueMapImage() to read the results back to the host instead of clEnqueueReadImage(). The code in Listing 8-8 shows the modifications needed to modify the sample program to read results using clEnqueueMapImage().

代码清单8-8 图像结果映射到宿主机内存指针
//Create the image object. Needs to be
//created with CL_MEM_READ_WRITE rather than
//CL_MEM_WRITE_ONLY since it will need to
//be mapped to the best
imageObjects[1] = clCreateImage2D(context,
		CL_MEM_WRITE_ONLY,
		&clImageFormat,
		width,
		height,
		0,
		NULL,
		&errNum);

//...Execute the kernel...
//Map the results back to a host buffer
size_t rowPitch = 0;
char * buffer = (char*)clEnqueueMapImage(commandQueue,
                                         imageObjects[1],
                                         CL_TRUE,
                                         CL_MAP_READ,
                                         origin,
                                         region,
                                         &rowPitch,
                                         NULL,
                                         0,
                                         NULL,
                                         NULL,
                                         &errNum);
if(errNum != CL_SUCCESS)
{
    
    
     std::cerr << "Error mapping result buffer." << std::endl;
     Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
     return 1;
}
// Save the image out to disk
if (!SaveImage(argv[2], buffer, width, height))
{
    
    
	std::cerr << "Error writing output image: " << argv[2] << std::endl;
	Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
	delete[] buffer;
	return 1;
}
//Unmap the image buffer
errNum = clEnqueueUnmapMemObject(commandQueue,
                                 imageObjects[1],
                                 buffer,
                                 0,
                                 NULL,
                                 NULL);
if(errNum != CL_SUCCESS)]
{
    
    
     std::cerr << "Error unmapping result buffer." << std::endl;
     Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
     return 1;
}

This time the resulting image object is created with the memory flag CL_MEM_READ_WRITE (instead of the original CL_MEM_WRITE_ONLY). This is necessary because clEnqueueMapImage() is called with CL_MAP_READ as the map flag, which allows us to read what is returned in the host buffer. Another modification is that the row length must be read back explicitly and cannot be assumed to be equal width * bytesPerPixel. Additionally, the host pointer buffer must be unmapped using clEnqueueUnmapMemObject() in order to release its resources.

There is also an important performance issue to understand about copying and mapping image data, the OpenCL specification does not enforce the internal storage layout of images. That is to say, although the image on the host machine looks like a linear buffer, the OpenCL implementation can store the image in a non-linear format internally. More commonly, an OpenCL implementation may tile the image data for optimal access by the hardware. This tile format is opaque (and often private), and users of the OpenCL implementation neither see nor access the tiled buffers. However, from a performance perspective, this means that when reading/writing/mapping buffers to and from the host, the OpenCL implementation may need to re-tile the data to satisfy its own optimal internal format. The performance impact of this approach is likely to be entirely dependent on the underlying OpenCL hardware, but it is important for the user to be aware of it, so as to limit allowing such tiling/untiling operations to only when absolutely necessary .

Gaussian filter kernel example

The ImageFilter2D sample program first loads a 2D image from a file (such as .png, .bmp, etc.), and stores the image bits in a 2D image object. This program also creates another 2D image object that will store the result of running a Gaussian blur filter on the input image. This program queues the kernel for execution, then reads the image from the OpenCL device back to a host memory buffer. Finally, the contents of this host memory buffer are written to a file.

//main.cpp

#include <iostream>
#include <fstream>
#include <sstream>
#include <string.h>

#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif

#include "FreeImage.h"
#pragma warning(disable : 4996)

///
//  Create an OpenCL context on the first available platform using
//  either a GPU or CPU depending on what is available.
//
cl_context CreateContext()
{
    
    
	cl_int errNum;
	cl_uint numPlatforms;
	cl_platform_id firstPlatformId;
	cl_context context = NULL;

	// First, select an OpenCL platform to run on.  For this example, we
	// simply choose the first available platform.  Normally, you would
	// query for all available platforms and select the most appropriate one.
	errNum = clGetPlatformIDs(1, &firstPlatformId, &numPlatforms);
	if (errNum != CL_SUCCESS || numPlatforms <= 0)
	{
    
    
		std::cerr << "Failed to find any OpenCL platforms." << std::endl;
		return NULL;
	}

	// Next, create an OpenCL context on the platform.  Attempt to
	// create a GPU-based context, and if that fails, try to create
	// a CPU-based context.
	cl_context_properties contextProperties[] =
	{
    
    
		CL_CONTEXT_PLATFORM,
		(cl_context_properties)firstPlatformId,
		0
	};
	context = clCreateContextFromType(contextProperties, CL_DEVICE_TYPE_GPU,
		NULL, NULL, &errNum);
	if (errNum != CL_SUCCESS)
	{
    
    
		std::cout << "Could not create GPU context, trying CPU..." << std::endl;
		context = clCreateContextFromType(contextProperties, CL_DEVICE_TYPE_CPU,
			NULL, NULL, &errNum);
		if (errNum != CL_SUCCESS)
		{
    
    
			std::cerr << "Failed to create an OpenCL GPU or CPU context." << std::endl;
			return NULL;
		}
	}

	return context;
}

///
//  Create a command queue on the first device available on the
//  context
//
cl_command_queue CreateCommandQueue(cl_context context, cl_device_id* device)
{
    
    
	cl_int errNum;
	cl_device_id* devices;
	cl_command_queue commandQueue = NULL;
	size_t deviceBufferSize = -1;

	// First get the size of the devices buffer
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &deviceBufferSize);
	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Failed call to clGetContextInfo(...,GL_CONTEXT_DEVICES,...)";
		return NULL;
	}

	if (deviceBufferSize <= 0)
	{
    
    
		std::cerr << "No devices available.";
		return NULL;
	}

	// Allocate memory for the devices buffer
	devices = new cl_device_id[deviceBufferSize / sizeof(cl_device_id)];
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, deviceBufferSize, devices, NULL);
	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Failed to get device IDs";
		return NULL;
	}

	// In this example, we just choose the first available device.  In a
	// real program, you would likely use all available devices or choose
	// the highest performance device based on OpenCL device queries
	commandQueue = clCreateCommandQueue(context, devices[0], 0, NULL);
	if (commandQueue == NULL)
	{
    
    
		std::cerr << "Failed to create commandQueue for device 0";
		return NULL;
	}

	*device = devices[0];
	delete[] devices;
	return commandQueue;
}

///
//  Create an OpenCL program from the kernel source file
//
cl_program CreateProgram(cl_context context, cl_device_id device, const char* fileName)
{
    
    
	cl_int errNum;
	cl_program program;

	std::ifstream kernelFile(fileName, std::ios::in);
	if (!kernelFile.is_open())
	{
    
    
		std::cerr << "Failed to open file for reading: " << fileName << std::endl;
		return NULL;
	}

	std::ostringstream oss;
	oss << kernelFile.rdbuf();

	std::string srcStdStr = oss.str();
	const char* srcStr = srcStdStr.c_str();
	program = clCreateProgramWithSource(context, 1,
		(const char**)&srcStr,
		NULL, NULL);
	if (program == NULL)
	{
    
    
		std::cerr << "Failed to create CL program from source." << std::endl;
		return NULL;
	}

	errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
	if (errNum != CL_SUCCESS)
	{
    
    
		// Determine the reason for the error
		char buildLog[16384];
		clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
			sizeof(buildLog), buildLog, NULL);

		std::cerr << "Error in kernel: " << std::endl;
		std::cerr << buildLog;
		clReleaseProgram(program);
		return NULL;
	}

	return program;
}


///
//  Cleanup any created OpenCL resources
//
void Cleanup(cl_context context, cl_command_queue commandQueue,
	cl_program program, cl_kernel kernel, cl_mem imageObjects[2],
	cl_sampler sampler)
{
    
    
	for (int i = 0; i < 2; i++)
	{
    
    
		if (imageObjects[i] != 0)
			clReleaseMemObject(imageObjects[i]);
	}
	if (commandQueue != 0)
		clReleaseCommandQueue(commandQueue);

	if (kernel != 0)
		clReleaseKernel(kernel);

	if (program != 0)
		clReleaseProgram(program);

	if (sampler != 0)
		clReleaseSampler(sampler);

	if (context != 0)
		clReleaseContext(context);

}

///
//  Load an image using the FreeImage library and create an OpenCL
//  image out of it
//
cl_mem LoadImage(cl_context context, char* fileName, int& width, int& height)
{
    
    
	FREE_IMAGE_FORMAT format = FreeImage_GetFileType(fileName, 0);
	FIBITMAP* image = FreeImage_Load(format, fileName);

	// Convert to 32-bit image
	FIBITMAP* temp = image;
	image = FreeImage_ConvertTo32Bits(image);
	FreeImage_Unload(temp);

	width = FreeImage_GetWidth(image);
	height = FreeImage_GetHeight(image);

	char* buffer = new char[width * height * 4];
	memcpy(buffer, FreeImage_GetBits(image), width * height * 4);

	FreeImage_Unload(image);

	// Create OpenCL image
	cl_image_format clImageFormat;
	clImageFormat.image_channel_order = CL_RGBA;
	clImageFormat.image_channel_data_type = CL_UNORM_INT8;

	cl_int errNum;
	cl_mem clImage;
	clImage = clCreateImage2D(context,
		CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
		&clImageFormat,
		width,
		height,
		0,
		buffer,
		&errNum);

	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Error creating CL image object" << std::endl;
		return 0;
	}

	return clImage;
}

///
//  Save an image using the FreeImage library
//
bool SaveImage(char* fileName, char* buffer, int width, int height)
{
    
    
	FREE_IMAGE_FORMAT format = FreeImage_GetFIFFromFilename(fileName);
	FIBITMAP* image = FreeImage_ConvertFromRawBits((BYTE*)buffer, width,
		height, width * 4, 32,
		0xFF000000, 0x00FF0000, 0x0000FF00);
	return (FreeImage_Save(format, image, fileName) == TRUE) ? true : false;
}

///
//  Round up to the nearest multiple of the group size
//
size_t RoundUp(int groupSize, int globalSize)
{
    
    
	int r = globalSize % groupSize;
	if (r == 0)
	{
    
    
		return globalSize;
	}
	else
	{
    
    
		return globalSize + groupSize - r;
	}
}

///
//	main() for HelloBinaryWorld example
//
int main(int argc, char** argv)
{
    
    
	cl_context context = 0;
	cl_command_queue commandQueue = 0;
	cl_program program = 0;
	cl_device_id device = 0;
	cl_kernel kernel = 0;
	cl_mem imageObjects[2] = {
    
     0, 0 };
	cl_sampler sampler = 0;
	cl_int errNum;

	/*
	if (argc != 3)
	{
		std::cerr << "USAGE: " << argv[0] << " <inputImageFile> <outputImageFiles>" << std::endl;
		return 1;
	}
	*/
	std::string src_name = "C://Users//qzh//source//repos//ConsoleApplication5//x64//Debug//picture.jpeg";
	std::string dst_name = "C://Users//qzh//source//repos//ConsoleApplication5//x64//1.png";
	argv[1] = (char*)src_name .data();
	argv[2] = (char*)dst_name .data();

	// Create an OpenCL context on first available platform
	context = CreateContext();
	if (context == NULL)
	{
    
    
		std::cerr << "Failed to create OpenCL context." << std::endl;
		return 1;
	}

	// Create a command-queue on the first device available
	// on the created context
	commandQueue = CreateCommandQueue(context, &device);
	if (commandQueue == NULL)
	{
    
    
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	// Make sure the device supports images, otherwise exit
	cl_bool imageSupport = CL_FALSE;
	clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool),
		&imageSupport, NULL);
	if (imageSupport != CL_TRUE)
	{
    
    
		std::cerr << "OpenCL device does not support images." << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	// Load input image from file and load it into
	// an OpenCL image object
	int width, height;
	imageObjects[0] = LoadImage(context, argv[1], width, height);
	if (imageObjects[0] == 0)
	{
    
    
		std::cerr << "Error loading: " << std::string(argv[1]) << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	// Create ouput image object
	cl_image_format clImageFormat;
	clImageFormat.image_channel_order = CL_RGBA;
	clImageFormat.image_channel_data_type = CL_UNORM_INT8;
	imageObjects[1] = clCreateImage2D(context,
		CL_MEM_WRITE_ONLY,
		&clImageFormat,
		width,
		height,
		0,
		NULL,
		&errNum);

	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Error creating CL output image object." << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}


	// Create sampler for sampling image object
	sampler = clCreateSampler(context,
		CL_FALSE, // Non-normalized coordinates
		CL_ADDRESS_CLAMP_TO_EDGE,
		CL_FILTER_NEAREST,
		&errNum);

	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Error creating CL sampler object." << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	// Create OpenCL program
	program = CreateProgram(context, device, "ImageFilter2D.cl");
	if (program == NULL)
	{
    
    
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	// Create OpenCL kernel
	kernel = clCreateKernel(program, "gaussian_filter", NULL);
	if (kernel == NULL)
	{
    
    
		std::cerr << "Failed to create kernel" << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	// Set the kernel arguments
	errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &imageObjects[0]);
	errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &imageObjects[1]);
	errNum |= clSetKernelArg(kernel, 2, sizeof(cl_sampler), &sampler);
	errNum |= clSetKernelArg(kernel, 3, sizeof(cl_int), &width);
	errNum |= clSetKernelArg(kernel, 4, sizeof(cl_int), &height);
	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Error setting kernel arguments." << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	size_t localWorkSize[2] = {
    
     16, 16 };
	size_t globalWorkSize[2] = {
    
     RoundUp(localWorkSize[0], width),
								  RoundUp(localWorkSize[1], height) };

	// Queue the kernel up for execution
	errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL,
		globalWorkSize, localWorkSize,
		0, NULL, NULL);
	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Error queuing kernel for execution." << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	// Read the output buffer back to the Host
	char* buffer = new char[width * height * 4];
	size_t origin[3] = {
    
     0, 0, 0 };
	size_t region[3] = {
    
     width, height, 1 };
	errNum = clEnqueueReadImage(commandQueue, imageObjects[1], CL_TRUE,
		origin, region, 0, 0, buffer,
		0, NULL, NULL);
	if (errNum != CL_SUCCESS)
	{
    
    
		std::cerr << "Error reading result buffer." << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		return 1;
	}

	std::cout << std::endl;
	std::cout << "Executed program succesfully." << std::endl;

	//memset(buffer, 0xff, width * height * 4);
	// Save the image out to disk
	if (!SaveImage(argv[2], buffer, width, height))
	{
    
    
		std::cerr << "Error writing output image: " << argv[2] << std::endl;
		Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
		delete[] buffer;
		return 1;
	}

	delete[] buffer;
	Cleanup(context, commandQueue, program, kernel, imageObjects, sampler);
	return 0;
}

//ImageFilter2D.cl
// Gaussian filter of image

__kernel void gaussian_filter(__read_only image2d_t srcImg,
                              __write_only image2d_t dstImg,
                              sampler_t sampler,
                              int width, int height)
{
    
    
    // Gaussian Kernel is:
    // 1  2  1
    // 2  4  2
    // 1  2  1
    float kernelWeights[9] = {
    
     1.0f, 2.0f, 1.0f,
                               2.0f, 4.0f, 2.0f,
                               1.0f, 2.0f, 1.0f };

    int2 startImageCoord = (int2) (get_global_id(0) - 1, get_global_id(1) - 1);
    int2 endImageCoord   = (int2) (get_global_id(0) + 1, get_global_id(1) + 1);
    int2 outImageCoord = (int2) (get_global_id(0), get_global_id(1));

    if (outImageCoord.x < width && outImageCoord.y < height)
    {
    
    
        int weight = 0;
        float4 outColor = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
        for( int y = startImageCoord.y; y <= endImageCoord.y; y++)
        {
    
    
            for( int x = startImageCoord.x; x <= endImageCoord.x; x++)
            {
    
    
                outColor += (read_imagef(srcImg, sampler, (int2)(x, y)) * (kernelWeights[weight] / 16.0f));
                weight += 1;
            }
        }

        // Write the output value to image
        write_imagef(dstImg, outImageCoord, outColor);
    }
}

How to configure FreeImage.h and FreeImaged.dll and other related issues, see references

references

https://blog.csdn.net/qq_36314864/article/details/132041933?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22132041933%22%2C%22source%22%3A%22qq_36314864%22%7D