nvidia [GPU architecture development contrast]


1 Introduction

Easy retrieval

2 Glossary

FLOPS : "floating-point operations per second," "second peak velocity" is an abbreviation for "number of floating point operations per second executed" (floating-point operations per second ) of. The so-called "floating point", in fact, includes all operations involving decimals. Such operations often occur in certain types of application software, but they also spend more time than integer arithmetic. Most modern processors, the process has a dedicated floating-point arithmetic to "floating-point operator" (FPU). FLOPS and therefore the measurement, in fact, the execution speed of the FPU. The most commonly used measure FLOPS one benchmark program (benchmark), is Linpack.

  • A MFLOPS (megaFLOPS) equals one million per second (= 10 ^ 6) times the floating-point operations,
  • A GFLOPS (gigaFLOPS) equal gigabits per second (= 10 ^ 9) times the floating-point operations,
  • A TFLOPS (teraFLOPS) equals one trillion per second (= 10 ^ 12) times the floating-point operations, (1 pull too)
  • A PFLOPS (petaFLOPS) equal to one thousand trillion times per (= 10 ^ 15) times the floating-point operations,
  • A EFLOPS (exaFLOPS) Jing equals one hundred per second (= 10 ^ 18) times the floating-point operations,
  • A ZFLOPS (zettaFLOPS) is equal to one hundred thousand per second Beijing (= 10 ^ 21) times the floating-point operations.

Floating-point precision : half-precision, single precision, double precision. The official nvidia white paper which involves: FP16 half precision (16-bit floating-point number); FP32 is a single-precision (32-bit floating-point number); FP64 is a double-precision (64-bit floating-point number).

INT8: 8-bit integer arithmetic (one byte), the new 8-bit integer enhanced support is in a period of the 32-bit packed in the four 8-bit integers (denoted: A0A1A2A3), complete operation: Y = A0 * B0 + A1 * B1 + A2 * B2 + A3 * B3 + X wherein X and Y are 32-bit integers, a and B are integer-bit 8 INT8 theoretically capable of 400%. performance (compared to float) completion point 8-bit integer multiplication and accumulate operations. recent popular use of neural networks to derive useful.

references:

Guess you like

Origin www.cnblogs.com/shouhuxianjian/p/9817243.html