How to calculate the power of the processor computing

We used double-precision floating point performance measure of a processor's capacity scientific computing, is the ability to handle 64bit floating decimal point data
  • AVX2 support single processor instruction length is 256bit, each comprising two core assumptions intel FMA, a FMA may be one clock cycle or two times by the addition operation, then the processor core in a clock cycle 1 may be performed 256bit * 2FMA * 2M / A / 64 = 16 floating-point operations, also known as 16FLOPs, is Floating Point Operations Per Second;
  • Support AVX512 single processor instruction length is 512Bit, each comprising two core assumptions intel FMA, a FMA may be one clock cycle or two times by the addition operation, then the processor core 1 in one clock cycle may be performed 512bit * 2FMA * 2M / A / 64 = 32 floating-point operations, also known as 32FLOPs,
This means that in theory, the latter is in fact the computing power of the former doubled, but impossible to achieve in practice, because the longer perform arithmetic instruction, more intensive between lines, but the core frequency is reduced; the ability to cause the entire processor reduce;
Computing power and the number of a processor core, the core frequency, single-cycle core capabilities three factors are related
  • For example: Now intel purley platform flagship skylake 8180 is [email protected], support AVX512, its theoretical properties are double precision floating point: 28Core * 2.5GHZ * 32FLOPs / Cycle = 2240GFLPs = 2.24TFLOPs
  • For example: the Ultimate intel purley platform now cascade lake 8280 is [email protected], support AVX512, its theoretical properties are double precision floating point: 28Core * 2.7GHZ * 32FLOPs / Cycle = 2419.2GFLPs = 2.4192TFLOPs
 
 
CPU GPU can do can do, CPU GPU do not necessarily able to do, a GPU clock cycles can generally operate 64bit data, and a core achieve an FMA.
  • GPU computing capability of this unit are: 64bit * 1FMA * 2M / A / 64bit = 2FLOPs / Cycle
GPU computing power is the same and the number of the core, the core frequency, single cycle capacity of core three factors.
But the number of multi-core GPU Jiabu Zhu ah
  • For example: for now nvidia tesla class of p100 Ultimate pascal, it is [email protected], double precision floating point performance theory are: 1792Core * 1.328GHZ * 2FLOPs / Cycle = 4759.552GFLOPs = 4.7TFLOPs
  • For example: for now nvidia tesla class of volta Ultimate v100, it is [email protected], double precision floating point performance theory are: 2560Core * 1.245GHZ * 2FLOPs / Cycle = 6374.4GFLOPs = 6.3TFLOPs
 
 
Now ML era of prosperity, the demand for 64bit floating point length is not so big, but demand is 32bit or 16bit floating point operations is relatively large.
Therefore, the latest nvidia tesla been emphasizing single-precision or even semi-precision, turing is one such.
intel order to accelerate these calculations, which are also a number of processor instructions implemented in a low acceleration precision operations.

Guess you like

Origin www.cnblogs.com/kongchung/p/11295636.html