Making a Mesh of Things
AT THE BEATING HEART of your PC sits the CPU, crunching tasty bits of 0s and 1s in byte-sized chunks. It does so at an extremely fast rate— around four billion cycles per second—but each core can do up to six instructions per cycle, with Skylake-X CPUs co
Modern CPUs are complex, and the task of routing data between various parts—cache, cores, memory, and I/O controllers—is a critical element of the CPU architecture. For several generations, Intel’s HEDT CPUs have used a ring bus architecture.
Think of it as mass transit for data, running in a loop, with stations where data can get on or off. As core counts increase, dual rings are added, with a buffered switch between them. If data needs to move between rings, it’s like getting off at a transfer station, and waiting for the next train—a five-cycle delay, on top of the delay from traversing the rings.
The Broadwell HCC (High Core Count) designs supported up to 24 cores, and while it’s possible to use additional rings for higher core counts, the increased latency limits scalability. With the Skylake-X HCC/ XCC designs (6 to 28 cores), Intel is using a new mesh network. Each block (core, memory, I/O, cache, and so on) has a router, with the blocks in grids. It’s like city blocks, with the router switches at each intersection directing traffic. The goal is improved scalability through lower latency.
Despite all the talk of the mesh lowering latency, reducing power use, and improving scalability, in testing it’s not all sunshine and roses. Comparing the 10-core Broadwell-E i7-6950X to the 10-core Skylake-X i9-7900X, inter-core communication latencies have increased from 80ns to 100ns. The real-world impact is nowhere near that, though, and higher per-core performance does compensate.
More concerning in my testing of Skylake-X CPUs is that power draw has gone way up from Broadwell-E. At stock, power draw isn’t too bad, but all X299 motherboards I’ve tested auto-overclock. Intel rates the i9-7900X for all-core turbo of 4.0GHz, with a maximum turbo of 4.3GHz (or 4.5GHz via Turbo Boost 3.0 Max), but the base clock is only 3.3GHz.
That base clock is what the CPU is guaranteed to achieve without exceeding TDP, and it’s up to the mobo firmware to keep things in check. If power use goes over TDP, clock speeds should drop, but boards are being more aggressive. Some run all cores on the i9-7900X at 4.0GHz, no matter what, and others default to 4.3GHz and even 4.5GHz. Power use scales rapidly, but the real problems start to show up on the i9-7960X and i9-7980XE.
The TDP is 165W for both, and they went well over that. System power use in Cinebench R15 is around 350W, with 50-100W going to other components, so the CPUs use over 200W. Push clock speeds to 4.0-4.4GHz on all cores, and it goes to over 500W. Overclocker der8auer took things to the next level with liquid nitrogen on the i9-6980XE, and got all 18 cores to 6.1GHz— using over 1,000W just for the CPU.
These are amazingly fast CPUs, but we’re hitting the limits of 14nm. Mesh topology may pave the way for more cores, but even though it has two fewer cores, the i9-7960X is only 3-5 percent slower than the i9-7980XE. If Intel made a 24-core CPU, without 10nm it’s unlikely to deliver a significant boost in performance without an equivalent increase in power use. Moore’s Law is dead, right when we need it most.
Skylake-X uses a mesh topology to allow scaling to higher core counts.