PCWorld (USA)

AMD’S 32-core Threadripp­er performanc­e

HERE’S HOW MUCH JUST MEMORY ME BANDWIDTH BA CONSTRAINT­S CO MIGHT MIG BE HURTING HU THE PERFORMANC­E. PER

- BY G GORDON MAH UNG

AMD’S 32-core Threadripp­er 2990WX is the fastest consumer CPU ever sold ( go. pcworld.com/299x). And let’s be clear: We’re in full agreement with anyone who said that. But we would also be the first ones to say it has its limitation­s, too.

The most glaring is the lack of consumer applicatio­ns that can truly exploit the cores available. The other limitation is apparent in the diagram below, which shows how AMD built this 32-core monster. Rather than a single chip with every single CPU core on it, AMD connects four dies using its high-speed Infinity Fabric.

WHY MEMORY BANDWIDTH AFFECTS THE 32-CORE THREADRIPP­ER

If you look closer at the diagram, you can see that two of the dies don’t have their own memory controller­s or PCIE access. Instead, they have to talk to an adjacent CPU die.

It is, essentiall­y, like having a two-apartment unit where the second one must access the hallway outside by going through the first apartment.

Perhaps more important is the overall bandwidth available. AMD had initially said the total bandwidth available between the four CPU dies was 25Gbps bi-directiona­l. The company amended its original documentat­ion to state it was total bandwidth. Compare that with the 16-core Threadripp­er 2950X, with its 50Gbps of bandwidth and two links between the two dies (also updated informatio­n from AMD).

Many believe this is Threadripp­er 2990WX’S main weakness: Lack of memory bandwidth per core is impacting it in memory-intensive tasks such as compressio­n and encoding. Even worse for Threadripp­er 2990WX is that bandwidth has to be shared on a CPU with 14 more cores than Intel’s Core I9-7980XE.

Below, you can see the result of Sandra 2018 Titanium’s memory bandwidth test and the available bandwidth per core. As you can see, the bandwidth per core plummets from almost 5GB at 8-core and 16-core to just 2GB when you utilize all 32 cores.

Synthetic memory bandwidth tests are one thing. To dig further into performanc­e in

memory-intensive tests, we fired up the newest version of the free and popular 7-Zip applicatio­n. Written by Igor Pavlov, this open-source compressio­n and decompress­ion utility is popular and generally awesome. For example, when I run tests on a laptop and decompress Cinebench R15.08 and its thousands of small files with Windows 10’s built-in utility, it takes several minutes to finish. I can actually connect to the Internet, download 7-Zip, and decompress the contents of Cinebench R15.08 with it in less time than it takes the built-in Windows utility to do its thing.

The GUI version runs two tests, for compressio­n and decompress­ion. The overall score looks like a simple average of the two results.

WHAT ARE 7-ZIP TESTS?

You can read more about the test on the 7-cpu. com web site ( go.pcworld.com/7cpu), but we’ve highlighte­d some of the key informatio­n about the tests here. Regarding the Compressio­n test, the website discusses the factors that influence the test results, saying it “strongly depends from memory (RAM) latency,

Data Cache size/speed and TLB. Out-of-order execution feature of CPU is also important for that test.” The site goes on: “The compressio­n test has big

number of random accesses to RAM and Data Cache. So big part of execution time the CPU waits the data from Data Cache or from RAM.”

About the Decompress­ion test, the website says it “strongly depends on CPU integer operations. The most important things for that test are: branch mispredict­ion penalty (the length of pipeline) and the latencies of 32-bit instructio­ns (‘multiply’, ‘shift’, ‘add’ and other). The decompress­ion test has very high number of unpredicta­ble branches.”

HOW WE RETESTED THREADRIPP­ER VS. CORE I9

For our retest, we decided to lock both the Threadripp­er 2990WX and the Core I9-7980XE at 3GHZ to remove any variables from each CPU’S boost schemes. This was done to make the comparison more dependent on the test rather than the clock speed difference­s between the two. We also set both to DDR4/3,200 clocks, and both were run in quad-channel mode except where noted. To be up-front: The Threadripp­er system had a slight edge in CAS latency at CL14 and 1T, while the Core i9 was running at CL15 and 2T. As in our original review, both were running Founders Edition GTX 1080 cards using the same drivers and the same version of Windows 10 Enterprise Edition.

Because much of the concern over Threadripp­er is its per-core memory bandwidth

performanc­e, we decided to run from 1 thread to the maximum number of threads on each CPU. We also decided to see whether performanc­e of the Threadripp­er would change if you turned off dies, so we ran it with a single die (8 cores/16 threads), two dies (16 cores/32 threads), and all four (32 cores/64 threads).

In the integer-focused decompress­ion component of 7-Zip, the performanc­e was quite nice. Although we don’t see perfect scaling, there’s little difference in 7-Zip decompress­ion performanc­e as you switch off dies.

All of the tests were also completed using the GUI version of 7-Zip 18.05 with the default dictionary size of 32MB (although we did decide to recompile our own version, too.)

You’re probably more interested in the Core i9 vs. Threadripp­er 2990WX, so we ran that, of course. For the most part, it’s not bad for either part. Interestin­gly, Threadripp­er 2990WX seems to have that slight fall-off in decompress­ion performanc­e as you cross the threshold of 8 cores. Core i9 has a decent performanc­e advantage up to about 16 cores, but after that it runs out of steam and ends up losing to the 32-core Threadripp­er 2990WX CPU.

This shouldn’t surprise too many, though. The CPU performanc­e when you don’t run

out of memory bandwidth is a known quantity of the Threadripp­er 2990WX. You only have to look at our multi-threaded rendering tests to see how it’s simply a monster.

The question is, what happens under memory bandwidth or memory latency tests? Here are the results of the Threadripp­er 2990WX in 7-Zip’s compressio­n test. It’s not pretty, but the good news is switching dies off didn’t seem to matter. As you can see, the CPU appears to hit a ceiling at 26 threads, and then it just gets worse from there.

Perhaps worse is when you compare it to the Core I9-7980XE. Again—remember both of the CPUS were at a fixed clock speed of

3GHZ and DDR4/3200.

That’s just not a good look for the 32-core Threadripp­er 2990WX and does seem to confirm that memory latency and bandwidth chores suffer greatly.

But can memory bandwidth also hurt

Core i9? To find out, we switched the Core i9 system from quad-channel mode into single-channel mode. Unfortunat­ely, for our test, we did have to lower total memory to 16GB rather than 32GB due to lack of density on modules. The good news is the 7-Zip with the default dictionary fits fine, and we don’t believe overall memory capacity was the issue. We can say that overall memory bandwidth as measured in Sandra 2018 was cut from 77Gbps in quad-channel memory mode to 18.5Gbps in single-channel mode on the Intel part. Per-core memory bandwidth went from 4.8Gbps in quad-channel to 1Gbps in single-channel mode.

As you can see, the performanc­e of Core I9-7980XE also suffers when its memory bandwidth is drasticall­y cut. It doesn’t suffer as much as the Threadripp­er 2990XE, but this doesn’t appear to be the fault of some pro-intel code at work.

LINUX TESTS SHOW HOW WINDOWS 10 AFFECTS RESULTS

I’d normally say, okay, memory bandwidth and latency are the real issues, but there is that Linux thing. That is, in tests run by

Michael Larabel at Linux-focused site Phoronix ( go.pcworld.com/wslp), the Threadripp­er 2990WX actually performs on a par with the Core I9-7980XE rather than heavily trail it. Phoronix runs a slightly older version of 7-Zip,

but it’s clear that moving to

Linux helps Threadripp­er

2990WX. A lot. Phoronix even tested it using

Windows 10 Server.

Phoronix’s Linux test shows issues not just with

7-Zip, but also several other tests where

Windows 10 underperfo­rmed the Linux version. So it’s clear

Windows has an issue right now. But if you’re in the crowd that wholesale dismisses it as a weakness at all, I’m not so sure.

One Linux versus Windows test that would back up memory bandwidth and latency as issues are tests by Steve Walton over at Techspot.com ( go.pcworld.com/lvwb). Walton tested Windows and Linux performanc­e using the latest 7-Zip version and found Core i9 still ahead despite having fewer cores. Greatly improved for

Threadripp­er? Yes.

But still clearly slower in a multithrea­ded test that does scale to all available cores.

THE COMPILER IS ANOTHER FACTOR

In searching for more answers on Threadripp­er’s 7-Zip performanc­e, we wondered whether the compiler was at fault. If an outdated compiler was used to build the 7-Zip executable, it could certainly hurt the Threadripp­er’s performanc­e.

To find out, we downloaded the source code for 7-Zip, the latest version of Microsoft’s Visual Studio 2017, and compiled it into an executable.

We ended up with basically the same result, and it looks like the latest version of 7-Zip is actually on the latest available Visual C++ compiler. This doesn’t completely dismiss compilers, as different compilers do matter. If, for example, the applicatio­ns on Linux were compiled with the GCC or Intel compiler, it might explain the performanc­e difference­s.

HANDBRAKE TEST BRINGS UP MORE QUESTIONS

While Windows 10 clearly, clearly has issues with the design of Threadripp­er, it would be wrong to say memory bandwidth and latency aren’t in play.

To see just how much memory bandwidth helps or hurts both CPUS, we took Veracrypt and ran it with the larger 1GB workload. As we saw with 7-Zip, the Core i9‘s Veracrypt performanc­e drops off a cliff and is actually worse than the Threadripp­er’s (albeit with quad-core

memory), as you can see in the chart below.

The Threadripp­er 2990WX does suffer greatly with the 1GB workload. But if the issue is how Windows handles the memory configurat­ion on the Threadripp­er, it should get better after shutting off two dies, right? It does—but as you can see in the green bars below, performanc­e increases only slightly when limiting it to just 16 cores and two threads. The result is again confusing, because if Windows 10 is at fault for the poor performanc­e of the shared memory controller design, why is the performanc­e of the Threadripp­er 2990WX not as fast as the Core i9’s? Remember—both CPUS are locked at 3GHZ.

Our last test used Handbrake 1.1.1 to encode a 4K video file using the 1080p Chromecast preset. Note: This Handbrake result is different from others we’ve run, so it can’t be compared to previous results.

Video encoding is often associated with increased memory bandwidth. While it does matter, we can see it’s not a big deal even when you go from 77Gbps to 18Gbps on the Core i9 on this particular preset.

Our results from cutting the Threadripp­er’s die use from four to two also isn’t a big deal. It’s actually slightly faster with two dies turned off, but almost within the margin for error in Handbrake encodes.

This leads us to believe that the only reason a 32-core Threadripp­er is slightly slower than an 18-core Core i9 in this particular Handbrake run is likely due to the vagaries of Handbrake itself, and how well it runs on each processor. We should also note that the app itself is multi-threaded, but doesn’t scale with core counts.

THERE’S NO EASY ANSWER

If you were hoping for an easy answer to your lingering Threadripp­er performanc­e questions—take a number. Based on our tests, the answer is, it’s complicate­d.

While we didn’t do Linux testing, we’ve seen enough results run by others now to say that Windows 10 is handcuffin­g performanc­e in certain applicatio­ns (although the compiler

used for those particular tests might share some blame, too.)

We also believe that the Threadripp­er 2990WX can be handcuffed by memory bandwidth and latency in some workloads. It just makes sense when you’re talking about sharing quad-channel memory among 32 cores, versus sharing quad-channel memory among 18 cores.

In the end, we think you should still choose your high-performanc­e CPU based on the task it’ll do. Our results from our original review still basically apply. If you do thread-heavy tasks such as 3D rendering or modeling or tend to multi-task, having 32 cores and 64 threads in a Threadripp­er 2990WX ($1,749 on Amazon [ go.pcworld. com/29wx]) will be unlike anything you’ve ever had before.

If, however, you tend to stick to workloads that aren’t has heavily threaded, such as most video encoding chores, and need higher clock speeds on apps on lightly threaded applicatio­ns—and also are very memory bandwidth-dependent, the Core I9-7980XE ($2,000 on Amazon [ go.pcworld.com/79xe]) might be the better choice for you.

 ??  ??
 ??  ??
 ??  ?? AMD says the four-die Threadripp­er has 25GB of bandwidth shared among all of the chips.
AMD says the four-die Threadripp­er has 25GB of bandwidth shared among all of the chips.
 ??  ?? A two-die 16-core Threadripp­er 2950X has 50Gbps and two links between two dies versus the 25Gbps among four dies that AMD originally claimed (and then amended).
A two-die 16-core Threadripp­er 2950X has 50Gbps and two links between two dies versus the 25Gbps among four dies that AMD originally claimed (and then amended).
 ??  ?? Sisoft Sandra 2018 Titanium’s per-core memory bandwidth results say the Threadripp­er has only 2GB per core available.
Sisoft Sandra 2018 Titanium’s per-core memory bandwidth results say the Threadripp­er has only 2GB per core available.
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ?? Maybe it’s not the Threadripp­er after all?
Maybe it’s not the Threadripp­er after all?
 ??  ?? Techspot’s Linux vs. Windows test still puts Threadripp­er behind the Core i9.
Techspot’s Linux vs. Windows test still puts Threadripp­er behind the Core i9.
 ??  ?? We recompiled the source code for 7-Zip 18.05 using the latest version of Visual Studio 2017 and found that, well, that’s probably what 7-Zip was recently compiled with.
We recompiled the source code for 7-Zip 18.05 using the latest version of Visual Studio 2017 and found that, well, that’s probably what 7-Zip was recently compiled with.
 ??  ?? Cutting memory bandwidth just kills the performanc­e of the Core i9 but oddly the Threadripp­er’s performanc­e doesn’t bump up when two of the dies are switched off.
Cutting memory bandwidth just kills the performanc­e of the Core i9 but oddly the Threadripp­er’s performanc­e doesn’t bump up when two of the dies are switched off.
 ??  ?? Gutting memory bandwidth on the Core i9 didn’t see as drastic a change in performanc­e as you’d expect, which tells you how video encoding isn’t as dependent on memory bandwidth as you think.
Gutting memory bandwidth on the Core i9 didn’t see as drastic a change in performanc­e as you’d expect, which tells you how video encoding isn’t as dependent on memory bandwidth as you think.
 ??  ?? (Percent Performanc­e) If your applicatio­ns tend to use fewer threads and prefer higher clock speeds, you live on the left side of this chart, and the Core i9 makes more sense. If, however, you need more cores, you live on the right side of this chart, and the Threadripp­er is the better choice. Ryzen Threadripp­er 2990WX vs Core I9-7980XE
(Percent Performanc­e) If your applicatio­ns tend to use fewer threads and prefer higher clock speeds, you live on the left side of this chart, and the Core i9 makes more sense. If, however, you need more cores, you live on the right side of this chart, and the Threadripp­er is the better choice. Ryzen Threadripp­er 2990WX vs Core I9-7980XE

Newspapers in English

Newspapers from Australia