APC Australia

Nvidia’s Ampere doubles everything

It almost doubles everything.

-

There’s a saying from the days of DOS: Wait for the point release, or the inevitable first service pack. That was about software, but there’s a corollary for PC hardware: Skip the first generation of any radically new hardware, or don’t set your hopes too high. Turing was Gen1 ray tracing. Ampere is Gen2, and it’s looking like an absolute beast.

Nvidia has doubled just about everything. Let’s start with transistor counts. The TU102 GPU found in the RTX 2080 Ti had a then-impressive 18.6 billion transistor­s (via TSMC 12FFC 12nm). GA102, which is used in both the RTX 3090 and RTX 3080, tips the scales at 28 billion transistor­s, fabricated on an optimized Samsung 8N 8nm process. Yeah, it’s not quite double, but it’s a much bigger jump than the 50% increase from GP102’s 12 billion. More important is how you use those transistor­s.

GA102 has six GPCs (graphics processing clusters), each with 14 SMs (streaming multiproce­ssors). The RTX 3090 has 82 SMs enabled, while the RTX 3080 has 68. That’s already a healthy step up, but what’s inside the SMs has been massively updated. There are 64 dedicated FP32 CUDA cores, but where Turing had 64 more INT cores, Ampere has 64 more CUDA cores that can do FP32 or INT operations – potentiall­y double the FP32 performanc­e per SM. The RT cores have also been enhanced, doubling performanc­e per RT. There are situations where the new RT cores are even faster, but real-world performanc­e is closer to 1.7x faster than Turing. The Tensor cores received even more upgrades; instead of 4x4x4 matrices (128 FMA ops per cycle), they work on 8x4x8 matrices. But there are half as many Tensor cores per SM (four instead of eight), so FP16 Tensor ops performanc­e per SM only doubles.

Memory is 10GB of 19Gb/s GDDR6X on a 320-bit bus for the RTX 3080, or 24GB of 19.5Gb/s GDDR6X on a 384-bit bus for the RTX 3090. That’s 760GB/s of bandwidth for the 3080, and 936GB/s for the 3090. Not quite double that of the 20-series parts, but a significan­t jump. Oh, and the GPUs are clocked at 1,700MHz boost for the 3090 and 1,710MHz for the 3080, though in tests I’m seeing more like 1,850– 1,900MHz.

Going back to the CUDA cores, Turing allowed for fully concurrent FP32 and INT operations, while Ampere is either concurrent FP32 and INT, or FP32 and FP32. That means theoretica­l TFLOPS has skyrockete­d, so the RTX 3080 checks in at 29.8 TFLOPS and the RTX 3090 hits 35.7 TFLOPS. That’s nearly triple the RTX 20-series, but in games the performanc­e is more like 70% higher for the 3080 versus the 2080 (at 4K ultra – it’s less of a jump at 1440p, and very much CPU limited at 1080p). That’s because games typically have around a 65/35 split between FP32 and INT workloads. In other words, two-thirds of the FP32/ INT CUDA core time is spent on INT operations.

What does it all mean? More performanc­e at lower prices. The RTX 3080 is around 35% faster than the 2080 Ti, and sometimes twice as fast as the RTX 2080 FE, all for the same price as the outgoing 2080 Super. It’s a great time to start thinking about a new GPU upgrade.

“Memory is 10GB of 19Gb/s GDDR6X on a 320-bit bus for the RTX 3080, or 24GB of 19.5Gb/s GDDR6X on a 384-bit bus for the RTX 3090. That’s 760GB/s of bandwidth for the 3080, and 936GB/s for the 3090.”

 ??  ??
 ??  ?? JARRED WALTON Jarred Walton has been a PC and gaming enthusiast for over 30 years.
JARRED WALTON Jarred Walton has been a PC and gaming enthusiast for over 30 years.

Newspapers in English

Newspapers from Australia