Qualcomm_Mobile_OpenCL.pdf translation -2

Introduction of 2 Opencl

         This chapter discusses the key concepts Opencl standards and develop basic knowledge Opencl programs on mobile platforms. If you want to know more about the detailed knowledge Opencl, please refer to the references in "The OpenCL Specification". For developers who already have basic knowledge and experience of OpenCL can skip this chapter, you can jump directly to the next chapter to read.

2.1 OpenCL background and overview

         Opencl is how the program is for the parallel cross-platform across heterogeneous systems by the Khronos group to develop a completely free and open source standards and maintenance. OpenCL design philosophy is enormous computing power to help developers take advantage of the latest heterogeneous systems, and cross-platform application development becomes easier.

         Use the Snapdragon platform, Qualcomm Adreno GPU series is one of the earliest full support for GPU Opencl the phone.

        

FIG heterogeneous systems Opencl 2-1

 

         Figure 2-1 shows a typical heterogeneous systems of a supported Opencl. This system consists of three main parts:

  • A main CPU, in essence, is a director / controller, management and control applications.
  •  A plurality OpenCL devices, including GPU, DSP, FPGA, and a hardware accelerator.
  •   Kernel code master device Kernel Code compiled and downloaded to execute the OpenCL devices.

2.2 phone on OpenCL

         In recent years, the system (SOCs) on-chip phone has made significant progress in computing power, complexity and functionality. GPU (GPU phone) on the phone SOCs is very powerful, some of the raw computing power of the GPU's top mobile phone can reach the console / distributed GPU (GPU computer ) level.

         Developers will face such a challenge: how to effectively use such a powerful GPU computing power, do not know how to quickly develop applications without the underlying implementation details of the GPU, while maintaining application compatibility on different SOCs?

         OpenCL creation is to solve the above problems, OpenCL cross-platform support enables developers to easily take advantage of the computing power on the phone SOCs. By using the OpenCL, advanced the art in many cases use can be conveniently used SOCs on the phone, such as image / audio processing, computer vision and machine vision.

         Qualcomm, using OpenCL on Andreno GPUs has successfully accelerated many cases, also show excellent performance, power and portable lines. For application on Snapdragon SOCs developed, it is strongly recommended to use OpenCL acceleration on the GPU.

 

2.3 OpenCL standard

         OpenCL standard package two main aspects: OpenCL running real-time API and OpenCL C language specification (that is, .cl file). API defines a set of functions running on the Host, including resource management, the kernel distribution (the distribution kernel function to run on different GPU), and many other tasks; OpenCL C language is used to write kernel function, kernel function is running on the OpenCL devices ( OpenCL device see Figure 2-1 on). C language API and will be explained in the next section.

         ( Reference to Figure 2-1 , OpenCL that define the API , which run on the main CPU , the task will be divided into a number of kernel functions, the kernel distribution function OpenCL to run on the device. )

        

2.3.1 OpenCL the API functions

         OpenCL API functions can be divided into two kinds, the platform layer and the run-time layer. Table 2-1 and Table 2-2 summarize the platform layer and some of the advanced features in real time.

 

Table 2-1 OpenCL platform layer function

 

Features

Detailed Description

Discovery platform

Check the current OpenCL platform is available

OpenCL devices found

OpenCL devices found available on the GPU, CPU or other device

Query information OpenCL device

Query OpenCL device information include: global memory size (global memory size), the size of local memory (local memory size), the maximum number of working groups (maximun workgroup size) and so on. And check that the device supports the extended function (defined in the OpenCL standard and extended functions).

Context

Context management, such as context (context) of creation, retention and release

 

        

Table 2-2 OpenCL function of run-time layer

 

Features

Detailed Description

Order management queue (Command queue)

OpenCL command queue for the device (such as GPU) and a master device (such as main CPU) communication between an application can have multiple queues.

Create and compile programs and OpenCL kernel (kernel), (compiled .cl file)

Check the kernel whether to download and compile correctly

Prepare for the kernel to execute data, create a memory object and initializes

What kind of memory markers (such as read-only write-only, etc.) ? Is there to create a direct copy of the memory 0 (0 copies will be explained in detail in Chapter 7) ?

On the Create a kernel call, and submit it to the corresponding OpenCL device

How to use the workgroup (workgroup)?

Synchronize

Memory synchronous (need to wait for the results before running OpenCL complete copy) .

Resource Management

Transfer operation results (operation results on the OpenCL devices to the master copy) and release of resources.

 

         Understand these two layers is to write OpenCL API application's basic requirements. Referring to reference documentation for more details.

 

2.3.2  OpenCL 的C语言(规定如何写.cl文件)

         作为C99 标准的一个子集,OpenCL的C语言是用来写能编译和能在设备(以后OpenCL设备就简称设备)上运行的kernel函数的。有C语言编程经验的开发者能够很快上手OpenCL的C语言编程。但是,为了避免一些常见的错误,理解C99标准和OpenCL C语言之间的差别也是至关重要的。下面是两个关键的不同点:

  •   由于硬件的限制和OpenCL的执行模型,一些C99的特性在OpenCL上并不支持。比如函数指针,动态内存分配(malloc/calloc等)
  •   OpenCL语言在某些方面扩展了C99标准,是为了更好的服务编程模型和方便开发。比如:
    • OpenCL添加了内建函数来查询OpenCL内核的执行参数。  
    • 为了更好的使用GPU硬件,添加了图片加载和存储函数。

2.3.3  OpenCL的版本和概述

         当前的OpenCL v2.2和临时SPIR-V 1.2标准包含了许多改进的特性。可参照参考目录获取更多细节。

         OpenCL定义了两种profiles(不好翻译),嵌入式的profiles和完整的profiles。嵌入式的profiles主要是用于手机设备,相比于传统的计算设备比如台式机的GPUs,手机设备的计算精度更低,硬件特性更少。参考文档列出了嵌入式profiles和完整profiles之间的主要不同点。

2.4 OpenCL的可移植性和向后兼容性

2.4.1 程序的可移植性

         作为一个被严格定义的计算标准,OpenCL有很好的可移植性。如果程序没有使用任何供应商特有的特性或者平台特有的扩展或特性,针对一个供应商平台写的OpenCL程序可以很好地在另一个供应商平台上运行。

         OpenCL程序的兼容性已经被Khronos的验证程序保证了。如果OpenCL供应商声称他们是符合OpenCL标准的,Khronos的验证程序会要求OpenCL供应商在他们的平台上通过严格的一致性测试。

2.4.2 性能的可移植性

         不像程序的可以执行,OpenCL的性能并不是可移植的。作为一个高级别的计算标准,OpenCL的硬件实现是取决于供应商的。不同的硬件供应商有不同的硬件架构,每一种架构都有它自己的优势和劣势。所以,针对某一个供应商平台开发和优化的OpenCL的应用程序,在另一个供应商的平台上可能不会有同样的性能

         甚至对于同一个供应商,他们的不同系列的GPU硬件在微观架构和特性上都会有所不同,这样也会导致OpenCL程序表现出显著的性能差异。所以,针对老一代的硬件优化的程序经常需要进行一些调整,来充分发挥新一代硬件的运算能力。

2.4.3 向后兼容性

         OpenCL能够完全向后兼容,来保证针对OpenCL旧版本的代码能够毫无问题的运行在新版本的OpenCL上。不过需要注意,因为有些API函数在新版本已经废弃不使用了,所以如果包含了OpenCL2.x版本头文件中并且使用了OpenCL 1.1 或者OpneCL1.2中过时的APIs,那么需要定义宏  CL_USE_DEPRECATED_OPENCL_1_1_APIS 或者CL_USE_DEPRECATED_OPENCL_1_2_APIS。

         OpenCL的扩展并不保证在新的设备上能够继续使用,所以使用扩展功能的应用程序必须检查新的设备是否支持他们。

Guess you like

Origin www.cnblogs.com/xiajingwang/p/10985575.html