PCPOWERPLAY

WHAT ARE TFLOPS?

-

One of the base measuremen­ts of GPU performanc­e is TFLOPS, or teraflops. What is a teraflop? It’s 1 trillion “Floating-point Operations Per Second.” Think about that for a moment. How long would it take you to add or multiply two numbers, like π (pi) and e (Euler’s number)? We’ll be nice and let you round off to just 3.14159 and 2.71828. Let us know when you’re finished – you can even use a calculator if you want. If you have a modern graphics card, it could do that same sort of calculatio­n anywhere from 10 trillion to 30 trillion times per second.

At its basic level, computer graphics involves math – lots and lots of math. So much math that those trillions of multiplica­tions and additions can actually be put to good use. A lot of the calculatio­ns are matrix operations, used to manipulate and project a 3D object onto a 2D plane like your PC’s display. That specific type of workload often involves multiplyin­g two matrices and then adding a third matrix to the result, so GPUs have a special operation called FMA: Fused multiply-add.

Each FP32 graphics core can execute one such FMA each clock cycle, so to get TFLOPS you take the number of “cores” times the clock speed, times two. And that’s where you get TFLOPS. Related to this is a secondary metric, TOPS – teraops. This is used when the workload isn’t floating-point, so INT32, or even INT8 or INT4.

Nvidia’s Turing and Ampere architectu­res don’t just have CUDA cores, however. There are also Tensor cores, which are even more optimized to do massive amounts of math. On Turing, each Tensor core could do a 4x4x4 matrix FMA each cycle. That’s 128 FLOPS per Tensor core. The catch is that the Tensor cores use a lower-precision FP16 (16-bit floating-point) format, and they’re not tuned to do all the graphics calculatio­ns that the regular cores can handle. Ampere ups the ante with 8x4x4 Tensor core calculatio­ns, so twice the FP16 throughput for Tensor operations. Ampere also uses the Tensor cores for fast math (FP16) GPU calculatio­ns, but only at the same rate as the GPU FP32 cores.

Newspapers in English

Newspapers from Australia