AMD’S 32-core Thread­rip­per per­for­mance

HERE’S HOW MUCH JUST MEM­ORY ME BAND­WIDTH BA CON­STRAINTS CO MIGHT MIG BE HURT­ING HU THE PER­FOR­MANCE. PER

PCWorld (USA) - - Contents - BY G GOR­DON MAH UNG

AMD’S 32-core Thread­rip­per 2990WX is the fastest con­sumer CPU ever sold ( go. pc­world.com/299x). And let’s be clear: We’re in full agree­ment with any­one who said that. But we would also be the first ones to say it has its lim­i­ta­tions, too.

The most glar­ing is the lack of con­sumer ap­pli­ca­tions that can truly ex­ploit the cores avail­able. The other lim­i­ta­tion is ap­par­ent in the di­a­gram be­low, which shows how AMD built this 32-core mon­ster. Rather than a sin­gle chip with ev­ery sin­gle CPU core on it, AMD con­nects four dies us­ing its high-speed In­fin­ity Fab­ric.

WHY MEM­ORY BAND­WIDTH AF­FECTS THE 32-CORE THREAD­RIP­PER

If you look closer at the di­a­gram, you can see that two of the dies don’t have their own mem­ory con­trollers or PCIE ac­cess. In­stead, they have to talk to an ad­ja­cent CPU die.

It is, es­sen­tially, like hav­ing a two-apart­ment unit where the sec­ond one must ac­cess the hall­way out­side by go­ing through the first apart­ment.

Per­haps more im­por­tant is the over­all band­width avail­able. AMD had ini­tially said the to­tal band­width avail­able be­tween the four CPU dies was 25Gbps bi-di­rec­tional. The com­pany amended its orig­i­nal doc­u­men­ta­tion to state it was to­tal band­width. Com­pare that with the 16-core Thread­rip­per 2950X, with its 50Gbps of band­width and two links be­tween the two dies (also up­dated in­for­ma­tion from AMD).

Many be­lieve this is Thread­rip­per 2990WX’S main weak­ness: Lack of mem­ory band­width per core is im­pact­ing it in mem­ory-in­ten­sive tasks such as com­pres­sion and en­cod­ing. Even worse for Thread­rip­per 2990WX is that band­width has to be shared on a CPU with 14 more cores than In­tel’s Core I9-7980XE.

Be­low, you can see the re­sult of San­dra 2018 Ti­ta­nium’s mem­ory band­width test and the avail­able band­width per core. As you can see, the band­width per core plum­mets from al­most 5GB at 8-core and 16-core to just 2GB when you uti­lize all 32 cores.

Syn­thetic mem­ory band­width tests are one thing. To dig fur­ther into per­for­mance in

mem­ory-in­ten­sive tests, we fired up the new­est ver­sion of the free and pop­u­lar 7-Zip ap­pli­ca­tion. Writ­ten by Igor Pavlov, this open-source com­pres­sion and de­com­pres­sion util­ity is pop­u­lar and gen­er­ally awesome. For ex­am­ple, when I run tests on a lap­top and de­com­press Cinebench R15.08 and its thou­sands of small files with Win­dows 10’s built-in util­ity, it takes sev­eral min­utes to fin­ish. I can ac­tu­ally con­nect to the In­ter­net, down­load 7-Zip, and de­com­press the con­tents of Cinebench R15.08 with it in less time than it takes the built-in Win­dows util­ity to do its thing.

The GUI ver­sion runs two tests, for com­pres­sion and de­com­pres­sion. The over­all score looks like a sim­ple av­er­age of the two re­sults.

WHAT ARE 7-ZIP TESTS?

You can read more about the test on the 7-cpu. com web site ( go.pc­world.com/7cpu), but we’ve high­lighted some of the key in­for­ma­tion about the tests here. Re­gard­ing the Com­pres­sion test, the web­site dis­cusses the fac­tors that in­flu­ence the test re­sults, say­ing it “strongly de­pends from mem­ory (RAM) la­tency,

Data Cache size/speed and TLB. Out-of-or­der ex­e­cu­tion fea­ture of CPU is also im­por­tant for that test.” The site goes on: “The com­pres­sion test has big

num­ber of ran­dom ac­cesses to RAM and Data Cache. So big part of ex­e­cu­tion time the CPU waits the data from Data Cache or from RAM.”

About the De­com­pres­sion test, the web­site says it “strongly de­pends on CPU in­te­ger op­er­a­tions. The most im­por­tant things for that test are: branch mis­pre­dic­tion penalty (the length of pipe­line) and the la­ten­cies of 32-bit in­struc­tions (‘mul­ti­ply’, ‘shift’, ‘add’ and other). The de­com­pres­sion test has very high num­ber of un­pre­dictable branches.”

HOW WE RETESTED THREAD­RIP­PER VS. CORE I9

For our retest, we de­cided to lock both the Thread­rip­per 2990WX and the Core I9-7980XE at 3GHZ to re­move any vari­ables from each CPU’S boost schemes. This was done to make the com­par­i­son more de­pen­dent on the test rather than the clock speed dif­fer­ences be­tween the two. We also set both to DDR4/3,200 clocks, and both were run in quad-chan­nel mode ex­cept where noted. To be up-front: The Thread­rip­per sys­tem had a slight edge in CAS la­tency at CL14 and 1T, while the Core i9 was run­ning at CL15 and 2T. As in our orig­i­nal re­view, both were run­ning Founders Edi­tion GTX 1080 cards us­ing the same driv­ers and the same ver­sion of Win­dows 10 En­ter­prise Edi­tion.

Be­cause much of the con­cern over Thread­rip­per is its per-core mem­ory band­width

per­for­mance, we de­cided to run from 1 thread to the max­i­mum num­ber of threads on each CPU. We also de­cided to see whether per­for­mance of the Thread­rip­per would change if you turned off dies, so we ran it with a sin­gle die (8 cores/16 threads), two dies (16 cores/32 threads), and all four (32 cores/64 threads).

In the in­te­ger-fo­cused de­com­pres­sion com­po­nent of 7-Zip, the per­for­mance was quite nice. Although we don’t see per­fect scal­ing, there’s lit­tle dif­fer­ence in 7-Zip de­com­pres­sion per­for­mance as you switch off dies.

All of the tests were also com­pleted us­ing the GUI ver­sion of 7-Zip 18.05 with the de­fault dic­tionary size of 32MB (although we did de­cide to re­com­pile our own ver­sion, too.)

You’re prob­a­bly more in­ter­ested in the Core i9 vs. Thread­rip­per 2990WX, so we ran that, of course. For the most part, it’s not bad for ei­ther part. In­ter­est­ingly, Thread­rip­per 2990WX seems to have that slight fall-off in de­com­pres­sion per­for­mance as you cross the thresh­old of 8 cores. Core i9 has a de­cent per­for­mance ad­van­tage up to about 16 cores, but after that it runs out of steam and ends up los­ing to the 32-core Thread­rip­per 2990WX CPU.

This shouldn’t sur­prise too many, though. The CPU per­for­mance when you don’t run

out of mem­ory band­width is a known quan­tity of the Thread­rip­per 2990WX. You only have to look at our multi-threaded ren­der­ing tests to see how it’s sim­ply a mon­ster.

The ques­tion is, what hap­pens un­der mem­ory band­width or mem­ory la­tency tests? Here are the re­sults of the Thread­rip­per 2990WX in 7-Zip’s com­pres­sion test. It’s not pretty, but the good news is switch­ing dies off didn’t seem to mat­ter. As you can see, the CPU ap­pears to hit a ceil­ing at 26 threads, and then it just gets worse from there.

Per­haps worse is when you com­pare it to the Core I9-7980XE. Again—re­mem­ber both of the CPUS were at a fixed clock speed of

3GHZ and DDR4/3200.

That’s just not a good look for the 32-core Thread­rip­per 2990WX and does seem to con­firm that mem­ory la­tency and band­width chores suf­fer greatly.

But can mem­ory band­width also hurt

Core i9? To find out, we switched the Core i9 sys­tem from quad-chan­nel mode into sin­gle-chan­nel mode. Un­for­tu­nately, for our test, we did have to lower to­tal mem­ory to 16GB rather than 32GB due to lack of den­sity on mod­ules. The good news is the 7-Zip with the de­fault dic­tionary fits fine, and we don’t be­lieve over­all mem­ory ca­pac­ity was the is­sue. We can say that over­all mem­ory band­width as mea­sured in San­dra 2018 was cut from 77Gbps in quad-chan­nel mem­ory mode to 18.5Gbps in sin­gle-chan­nel mode on the In­tel part. Per-core mem­ory band­width went from 4.8Gbps in quad-chan­nel to 1Gbps in sin­gle-chan­nel mode.

As you can see, the per­for­mance of Core I9-7980XE also suf­fers when its mem­ory band­width is dras­ti­cally cut. It doesn’t suf­fer as much as the Thread­rip­per 2990XE, but this doesn’t ap­pear to be the fault of some pro-in­tel code at work.

LINUX TESTS SHOW HOW WIN­DOWS 10 AF­FECTS RE­SULTS

I’d nor­mally say, okay, mem­ory band­width and la­tency are the real is­sues, but there is that Linux thing. That is, in tests run by

Michael Lara­bel at Linux-fo­cused site Phoronix ( go.pc­world.com/wslp), the Thread­rip­per 2990WX ac­tu­ally per­forms on a par with the Core I9-7980XE rather than heav­ily trail it. Phoronix runs a slightly older ver­sion of 7-Zip,

but it’s clear that mov­ing to

Linux helps Thread­rip­per

2990WX. A lot. Phoronix even tested it us­ing

Win­dows 10 Server.

Phoronix’s Linux test shows is­sues not just with

7-Zip, but also sev­eral other tests where

Win­dows 10 un­der­per­formed the Linux ver­sion. So it’s clear

Win­dows has an is­sue right now. But if you’re in the crowd that whole­sale dis­misses it as a weak­ness at all, I’m not so sure.

One Linux ver­sus Win­dows test that would back up mem­ory band­width and la­tency as is­sues are tests by Steve Wal­ton over at Techspot.com ( go.pc­world.com/lvwb). Wal­ton tested Win­dows and Linux per­for­mance us­ing the lat­est 7-Zip ver­sion and found Core i9 still ahead de­spite hav­ing fewer cores. Greatly im­proved for

Thread­rip­per? Yes.

But still clearly slower in a mul­ti­threaded test that does scale to all avail­able cores.

THE COM­PILER IS AN­OTHER FAC­TOR

In search­ing for more an­swers on Thread­rip­per’s 7-Zip per­for­mance, we won­dered whether the com­piler was at fault. If an out­dated com­piler was used to build the 7-Zip ex­e­cutable, it could cer­tainly hurt the Thread­rip­per’s per­for­mance.

To find out, we down­loaded the source code for 7-Zip, the lat­est ver­sion of Mi­crosoft’s Visual Stu­dio 2017, and com­piled it into an ex­e­cutable.

We ended up with ba­si­cally the same re­sult, and it looks like the lat­est ver­sion of 7-Zip is ac­tu­ally on the lat­est avail­able Visual C++ com­piler. This doesn’t com­pletely dis­miss com­pil­ers, as dif­fer­ent com­pil­ers do mat­ter. If, for ex­am­ple, the ap­pli­ca­tions on Linux were com­piled with the GCC or In­tel com­piler, it might ex­plain the per­for­mance dif­fer­ences.

HAND­BRAKE TEST BRINGS UP MORE QUES­TIONS

While Win­dows 10 clearly, clearly has is­sues with the de­sign of Thread­rip­per, it would be wrong to say mem­ory band­width and la­tency aren’t in play.

To see just how much mem­ory band­width helps or hurts both CPUS, we took Ver­acrypt and ran it with the larger 1GB work­load. As we saw with 7-Zip, the Core i9‘s Ver­acrypt per­for­mance drops off a cliff and is ac­tu­ally worse than the Thread­rip­per’s (al­beit with quad-core

mem­ory), as you can see in the chart be­low.

The Thread­rip­per 2990WX does suf­fer greatly with the 1GB work­load. But if the is­sue is how Win­dows han­dles the mem­ory con­fig­u­ra­tion on the Thread­rip­per, it should get bet­ter after shut­ting off two dies, right? It does—but as you can see in the green bars be­low, per­for­mance in­creases only slightly when lim­it­ing it to just 16 cores and two threads. The re­sult is again con­fus­ing, be­cause if Win­dows 10 is at fault for the poor per­for­mance of the shared mem­ory con­troller de­sign, why is the per­for­mance of the Thread­rip­per 2990WX not as fast as the Core i9’s? Re­mem­ber—both CPUS are locked at 3GHZ.

Our last test used Hand­brake 1.1.1 to encode a 4K video file us­ing the 1080p Chrome­cast pre­set. Note: This Hand­brake re­sult is dif­fer­ent from oth­ers we’ve run, so it can’t be com­pared to pre­vi­ous re­sults.

Video en­cod­ing is of­ten as­so­ci­ated with in­creased mem­ory band­width. While it does mat­ter, we can see it’s not a big deal even when you go from 77Gbps to 18Gbps on the Core i9 on this par­tic­u­lar pre­set.

Our re­sults from cut­ting the Thread­rip­per’s die use from four to two also isn’t a big deal. It’s ac­tu­ally slightly faster with two dies turned off, but al­most within the mar­gin for er­ror in Hand­brake en­codes.

This leads us to be­lieve that the only rea­son a 32-core Thread­rip­per is slightly slower than an 18-core Core i9 in this par­tic­u­lar Hand­brake run is likely due to the va­garies of Hand­brake it­self, and how well it runs on each pro­ces­sor. We should also note that the app it­self is multi-threaded, but doesn’t scale with core counts.

THERE’S NO EASY AN­SWER

If you were hop­ing for an easy an­swer to your lin­ger­ing Thread­rip­per per­for­mance ques­tions—take a num­ber. Based on our tests, the an­swer is, it’s com­pli­cated.

While we didn’t do Linux test­ing, we’ve seen enough re­sults run by oth­ers now to say that Win­dows 10 is hand­cuff­ing per­for­mance in cer­tain ap­pli­ca­tions (although the com­piler

used for those par­tic­u­lar tests might share some blame, too.)

We also be­lieve that the Thread­rip­per 2990WX can be hand­cuffed by mem­ory band­width and la­tency in some work­loads. It just makes sense when you’re talk­ing about shar­ing quad-chan­nel mem­ory among 32 cores, ver­sus shar­ing quad-chan­nel mem­ory among 18 cores.

In the end, we think you should still choose your high-per­for­mance CPU based on the task it’ll do. Our re­sults from our orig­i­nal re­view still ba­si­cally ap­ply. If you do thread-heavy tasks such as 3D ren­der­ing or mod­el­ing or tend to multi-task, hav­ing 32 cores and 64 threads in a Thread­rip­per 2990WX ($1,749 on Ama­zon [ go.pc­world. com/29wx]) will be un­like any­thing you’ve ever had be­fore.

If, how­ever, you tend to stick to work­loads that aren’t has heav­ily threaded, such as most video en­cod­ing chores, and need higher clock speeds on apps on lightly threaded ap­pli­ca­tions—and also are very mem­ory band­width-de­pen­dent, the Core I9-7980XE ($2,000 on Ama­zon [ go.pc­world.com/79xe]) might be the bet­ter choice for you.

AMD says the four-die Thread­rip­per has 25GB of band­width shared among all of the chips.

A two-die 16-core Thread­rip­per 2950X has 50Gbps and two links be­tween two dies ver­sus the 25Gbps among four dies that AMD orig­i­nally claimed (and then amended).

Sisoft San­dra 2018 Ti­ta­nium’s per-core mem­ory band­width re­sults say the Thread­rip­per has only 2GB per core avail­able.

Maybe it’s not the Thread­rip­per after all?

Techspot’s Linux vs. Win­dows test still puts Thread­rip­per be­hind the Core i9.

We re­com­piled the source code for 7-Zip 18.05 us­ing the lat­est ver­sion of Visual Stu­dio 2017 and found that, well, that’s prob­a­bly what 7-Zip was re­cently com­piled with.

Cut­ting mem­ory band­width just kills the per­for­mance of the Core i9 but oddly the Thread­rip­per’s per­for­mance doesn’t bump up when two of the dies are switched off.

Gut­ting mem­ory band­width on the Core i9 didn’t see as dras­tic a change in per­for­mance as you’d ex­pect, which tells you how video en­cod­ing isn’t as de­pen­dent on mem­ory band­width as you think.

(Per­cent Per­for­mance) If your ap­pli­ca­tions tend to use fewer threads and pre­fer higher clock speeds, you live on the left side of this chart, and the Core i9 makes more sense. If, how­ever, you need more cores, you live on the right side of this chart, and the Thread­rip­per is the bet­ter choice. Ryzen Thread­rip­per 2990WX vs Core I9-7980XE

Newspapers in English

Newspapers from USA

© PressReader. All rights reserved.