Why are GPUs so popular among server manufacturers?

After years of experimentation, graphics processing units (GPUs) have begun to be valued by mainstream server manufacturers. Dell and IBM are the first tier-one server manufacturers to adopt GPUs in high-performance computers (HPCs). GPUs are usually used on desktop PCs, mainly as high-speed graphics accelerators for video games, but server makers quickly discovered that in addition to their excellent performance in game rendering, they also had inherent advantages in mathematical calculations.

In May of this year, IBM announced plans to provide a pair of Tesla M2050 GPUs for the iDataPlex dx360 M3 scale-out server. Dell is not far behind, announcing in June that the PowerEdge M610x blade server will be equipped with a pair of Tesla M2050 GPUs, and the M610x is equipped with an Intel Xeon The 5500 or 5600 processor can provide a maximum computing power of 40 billion times/s.

Behind the competition among these server manufacturers to pay attention to GPU computing, Nvidia (Nvidia) is the biggest winner. It is said that the early bird gets food. Nvidia has been promoting the GPU as a processing unit for math-intensive computing tasks for a long time. , but until Dell and IBM announced the integration of Tesla M2050 GPUs, Nvidia had not received support from first-tier server vendors.

GPU History

If you are familiar with the history of PC development, you must remember that the 8086, 80286 and 80386 processors had math coprocessors, 8087, 80287 and 80387 respectively, if you bought a PC in the late 1980s to do Mathematical or scientific computing, PC sales people should have told you about math coprocessors, these additional chips were designed for fast, accurate computations, the main buyers were spreadsheet users, because at the time Lotus 1-2- 3 is the killer app on x86, it's faster to compute with a math coprocessor installed.

By the time of 80486, the math coprocessor was integrated into the CPU, and subsequent processor architectures continued to add instructions to speed up math calculations. Today, the CPU design documents are called "floating point units" because math calculations are mainly floating point operations. .

Computers will only see numbers as integers or floating-point numbers. Integers have no decimal places (like 13 people), while floating-point numbers have decimal places (like 3.14159). Fine-grained calculations are all the work of the floating-point unit.

This is especially important for graphics, because calculating the position of the triangles that make up a smooth 3D image requires very precise fractions. A little more or less triangles will crack, ruining the overall effect of the image. Graphics processing software needs to calculate 30 decimal places to get accurate of fit, color and brightness.

Multicore Math Processor

Over the years, Nvidia and its competitor ATI (AMD acquired ATI in 2006) have produced a large number of multi-core math processors. Interestingly, Intel and AMD make CPUs that are mostly 4-6 cores, and Nvidia's latest Fermi architecture has 483 stream processors (of course it will consume more power and generate more heat), and the ATI Radeon 5000 series has reached 1600 stream processors.

Stream processing is mainly used for parallel processing of computing units, relying on software to manage memory allocation, data synchronization and communication, etc. These cores are connected through high-speed connection channels.

GPU threads are smaller than CPU threads because they only contain a bunch of math instructions, usually math instructions are simply treated as additions, GPUs can switch threads faster because cores can go from one thread to another in one clock cycle To another thread, which is not possible with some CPUs, a CPU thread is a series of complex instructions, such as system processes or operating system calls.

Lately, people in need of high performance computing have realized that those 483 to 1600 math cores might be able to do something other than render games, Nvidia and AMD of course have their hands full, and recently they have also enhanced the math synergies in GPUs processor.

The last thing to mention is double-precision floating-point arithmetic, which is necessary for complex scientific calculations. Both Nvidia and ATI have added double-precision floating-point arithmetic to their chips. Single-precision floating-point numbers are 32 bits long. (2^32), and double-precision floating-point numbers are 64-bit length (2^64), which has nothing to do with games, but scientific research is inseparable from it, such as global climate simulation scientific experiments.

GPU programming

If the servers in your data center have GPUs, you should take this into account when writing server applications, but utilizing GPUs is not an easy task and requires coordination between CPUs and GPUs, rather than referencing several off-the-shelf libraries , you can do it with a few lines of code.

电子游戏是大量使用浮点运算的很好示例,但游戏并非浮点运算的唯一用途,凡是与可视化相关的领域几乎都会牵扯到浮点运算,如医学成像,三维成像,科学成像,石油和天然气勘探可视化,娱乐,广告和金融建模等。

这个过程被称为GPGPU计算,或通用GPU计算,需要通过编程将本该由CPU处理的计算任任务交由GPU处理,很多时候,这意味着要重写代码,Nvidia使用CUDA开发语言来处理。

CUDA是一种类似于C的编程语言,用它可以开发在Nvidia GPU上并行运行的应用程序,与x86处理器不一样,应用程序不只并行运行2,4或8个线程,而是数百个线程。

Nvidia的付出也得到了回报,现在全世界有超过350所大学已经开设了CUDA开发课程,但如果只有Nvidia一家公司有这个干劲,最终也可能是徒劳的。

OpenCL项目是OpenGL的一个分支,它为3D显卡提供了一个图形库(在很大程度上可以取代微软的DirectX),苹果公司是OpenCL框架的创立者,OpenCL框架用于编写跨CPU、GPU和其它处理器执行的程序,OpenCL包括一个编写内核和API的语言,它们可用于基于任务和基于数据的并行编程。

OpenCL与CUDA相比有优势也有弱点,首先,它支持多处理器计算,而CUDA只支持Nvidia GPU;OpenCL可以让任何应用程序访问GPU,且不用重写代码,而CUDA必须用C为Nvidia GPU重写代码;OpenCL支持任何输入/输出处理器,因此它也支持安腾、Sun UltraSparc和ARM嵌入式处理器。

OpenCL框架比CUDA技术更新,因此缺少很多CUDA具有的特性,也没有CUDA成熟,最值得注意的是,CUDA拥有快速傅里叶变换内核(FFT),但OpenCL没有,FFT算法是一个复杂的算法,在高级科学计算和图像处理领域有着广泛的应用。

这两个框架都有各自的优缺点,CUDA凭借Nvidia的强力支撑已经占领了绝大多数市场,而OpenCL也不弱,它由标准化组织管理,但从目前来看还是一事无成,我们是将自己绑定到Nvidia还是等待标准更新?有点举棋不定。

除了CUDA和OpenCL外,还有第三个竞争对手,那就是微软的DirectCompute,DirectCompute是DirectX 11 API库中的一个组件,我们在Windows 7中可以找到它,和OpenCL一样,它可以让应用程序使用GPU的计算能力。由于它仅在DirectX 11中,因此使用DirectCompute有些限制,首先,它只能运行在Windows 7计算机上,因为微软没有为其它系统准备DirectX,市场上也没有那么多DirectX 11显卡,ATI现在处于领先地位,但Nvidia也正努力追赶,目前,DirectCompute还未被服务器厂商正式利用。

应用模式

那么谁最适合GPU计算?正如前文所述,高级数学运算是一种情况,还有医学研究中的疾病逆向工程,医学成像中的超声图像渲染,以及视频和图像处理领域,如电影特效制作。

Nvidia最近可谓风光极了,在2010奥斯卡奖项中,三部最佳特效电影(星际旅行,阿凡达和第9区)都离不开Nvidia的功劳,因为它们都使用了Nvidia GPU进行渲染。在商业领域,Nvidia GPU也被广泛用于能源研究,石油和天然气储量探测计算,以及股市趋势分析。

但你的数据中心并不一定需要GPU,因此不要认为下一次购买服务器就一定要买集成GPU的,对于基本的服务器任务,如文件服务,网页服务或数据库服务,GPU是帮不上忙的,对于I/O密集型应用,如应用程序服务器或数据库服务器,也是不需要GPU的,它们需要的是大内存,高速连接,或是固态存储,但都与GPU无关。

使用GPU涉及到编程,这也是目前新兴的一种编程类型,想进入这一行的开发人员应该接受专业的培训和教育,但不一定要参加硬件公司的培训,但诸如Nvidia和AMD提供的培训可能更扎实,但不一定适合企业级开发。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324381009&siteId=291194637