Pure ray-tracing performance?
How far are we away from doing pure ray tracing in a game? Theoretically, it could be done right now, but how would it perform? The best estimate we have of how a pure RT rendering solution would perform comes from 3DMark, with its DirectX Ray Tracing Feature Test. We ran the test on the latest AMD and Nvidia RT-capable GPUs, and it uses a relatively ambitious 2560x1440 resolution. Full ray tracing at 1440p? We could be in for a bumpy ride!
3DMark DXR Feature Test (avg. FPS)
OK, honestly that doesn’t even look that bad. The 3090 is already hitting nearly 60fps! Except, you have to see the test in action to understand what’s happening, and how it differs from a real game.
The whole test uses a static scene, and the camera moves around, adjusting focal length at times. Once the camera stops moving, the randomised rays being cast converge on an ideal ray-traced rendering of the scene, and things look pretty good. But when the camera’s moving, each pixel only gets 12 rays cast per frame, which results in pixel noise – especially apparent on the slower GPUs.
So, maybe this isn’t a true representation of a fully RT game engine. However, it does give us an interesting comparison of the latest GPUs, and the picture it paints doesn’t look too good for AMD. AMD’s fastest GPU, the 6900 XT, barely beats Nvidia’s 3060 Ti. AMD also falls behind Nvidia’s previous-generation RTX 2080 Ti. What’s particularly concerning is that even the RX 6800 is faster than the new consoles, which means the best the consoles can hope for is about one-third of the pure RT potential of the RTX 3090.
that in turn makes the rest of the room at least partially visible. Without RT, most games define a general minimum lighting level so that non-lit areas aren’t just black, but it’s not realistic.
The problem is that sometimes realism isn’t as fun. There’s a scene in a helicopter in Cyberpunk 2077, for example, that takes place at night. With RT enabled, all you can really make out is the dark silhouette of a person aiming a mounted minigun. Turn off RT and suddenly there are multiple people and lots of equipment inside the helicopter. It’s not necessarily that the RT implementation is wrong, but in the real world there would likely be other light sources that simply aren’t included in the game. •AMBIENT OCCLUSION, REFRACTIONS, AND CAUSTICS (OH MY!): These last three effects aren’t quite as eye-catching as the above, so we’re putting them all into one pot. Refractions have to do with the way light bends as it passes through a transparent object, such as glass or water. Frankly, a rough approximation of the distortion done via shaders is probably good enough for most games. Caustics is related to refractions, and has to do with the bright focus areas of light that you can get, such as with a magnifying glass or even a glass of wine. Cool? Perhaps, but also not critical for most games. Then finally there’s AO, which focuses on the way shadows tend to be darker in corners where polygons intersect.
The story is pretty much the same for these as for the earlier RT effects: Some things clearly look better, while other aspects hardly seem to change, and the performance hit remains. Can we see the difference between the varying forms of ambient occlusion – SSAO, HBAO+, or RTAO? Yes. Does RTAO look best? Yes. Would we actually notice the change in AO if we weren’t specifically looking for it? Perhaps in a more cerebral game, but in a fast-paced shooter, probably not.
Building for the future
What we’re getting at is that RT still feels very much in its infancy, as far as games are concerned, and game developers aren’t free to focus on the RT implementation as long as a significant percentage of the PC ecosystem still consists of old non-RT GPUs. And it does, if we’re going by the Steam Hardware Survey, which shows only 12 percent of all PCs using an Nvidia RTX graphics card (it also shows seven percent of PCs surveyed still using Intel integrated graphics).
This isn’t something new or unusual. We’ve been through this same sort of progression multiple times. There was the shift from 2D to 3D accelerators in the 90s, the “first GPUs” that added hardware transform and lighting support in the late 90s, and multiple generations of hardwareprogrammable shaders. Each time, it took years before the old hardware was truly abandoned by newer games. The past two years of consumer RT hardware have just been the latest case of hardware preceding the software.
The good news is that things are starting to change. Nvidia is no longer the only ray-tracing solution in town. Now that AMD also has hardware RT support for both PC GPUs and the latest consoles, we could potentially see an acceleration in the use of RT.
When will it become a required feature? Judging by the way things are progressing, we’re at least five years away from that happening for the biggest releases, and probably more like 10 years away. Individual games might decide to require RT hardware, but it’s going to take a long time before major studios will go all-in on ray tracing.
Part of that is simply a matter of economics. Major game launches have already reached the level of Hollywood movies, with costs in the tens of millions of dollars (or more), and that’s with relatively tame visuals.
If we want RT games to look anything like RT movies, it will require even more artistic talent to make RT shine. To recover the costs of creating a game, publishers need to be able to sell as many copies as possible, and that sits in direct opposition to the idea of making a game that requires RT hardware support.
In other words, we have the classic chicken and egg scenario. Game developers want more gamers to have RT-capable hardware before they pour resources into creating RT games. Gamers, on the other hand, want to see some real advantages for RT hardware before they’re willing to fork over the money for a graphics card upgrade. That brings us to the AMD vs. Nvidia (and maybe Intel) discussion.
Meet the Hardware Contenders
Let’s move on from what ray tracing can do for games and look at the hardware implementations. Nvidia has first-gen RTX 20-series GPUs that have defined the baseline for RT performance. The new Ampere architecture ushers in the second round of Nvidia RT hardware, promising up to double the performance for RT calculations. AMD, meanwhile, has just released its first-gen RT hardware, the RX 6000-series RDNA2 GPUs. How do the various GPUs compare in terms of RT capabilities? Well it depends.
Nvidia hasn’t disclosed a lot of the low-level details about how Ampere and Turing RT cores differ from each other. We know that Ampere has an additional ray/ triangle intersection functional unit, and Ampere’s RT cores also have the ability to take a time component (useful for things like RT motion blur). While Ampere is theoretically up to twice as fast as Turing per RT core, in practice Nvidia says that it’s about 70 percent faster. Unfortunately, that’s only scratching the surface of what the RT cores do and how they work.
For example, we know that Nvidia’s RT cores can perform ray/ bounding box intersections in addition to ray/triangle intersections. This is all part of the BVH (Bounding Volume Hierarchy) implementation used for RT in both DirectX Raytracing (DXR) and VulkanRT. In short, BVH is a structure that helps accelerate the process of determining which, if any, triangle a ray intersects. Rather than checking the ray against every triangle, it starts with comparing the ray against bounding boxes that get progressively smaller, until the algorithm reaches a point where checking a ray against individual triangles makes sense.
What we don’t know exactly is how fast Nvidia’s GPUs are at ray/ box vs. ray/triangle intersections. There are some situations where Ampere can do twice as many RT calculations per cycle, and others where one of the functional units may go unused. For AMD’s part, the RDNA2 GPUs have Ray Accelerators that can do either four ray/box intersection calculations per cycle, or one ray/triangle intersection. It’s also unclear if different types of RT calculations – for example, for reflections vs. shadows vs. global illumination – take different amounts of time per ray, or if they simply require more rays in general.
With hardware in hand, however, it’s possible to run tests to get a reasonable estimate of the performance – see Pure Ray Tracing Performance on page 51. That’s one specific test of RT
performance, and it may not be applicable to all RT implementations, but it does provide interesting data points. Which brings us to the final topic: Actual RT performance in currently shipping games.
RT Hardware Performance
We selected 10 graphics tests and games that use the DirectX Raytracing API and run on both AMD and Nvidia GPUs. That last point is important, because there are currently two games that use DXR and only work on AMD GPUs (Godfall and The Riftbreaker), and at least a few tests and games (Bright Memory Infinite, Cyberpunk 2077, and Wolfenstein Youngblood) that only work on Nvidia GPUs. Not surprisingly, each of the games is promoted by the respective GPU company, so the upcoming RT gaming wars could get messy (see table above).
Right now, it looks like Nvidia is crushing AMD in RT performance. Overall, the RTX 3070 comes in just ahead of the 6800 XT, and the 3080 is 25 percent faster than AMD’s RX 6900 XT. As for the 3090, it’s in a class (and price) of its own.
But then we look at Dirt 5 and have to wonder how much vendor-specific optimisations play a role. There, the 6800 XT beats the 3080 by 8 percent. Yes, this is an AMD-promoted game, and the DXR is still in beta. Maybe Nvidia will close the gap, and to be fair the RT effects in Dirt 5 aren’t particularly impressive. As for the other games, they came out before AMD’s RX 6000 series launched, which means they were optimised for Nvidia by default.
The thing to keep in mind is that AMD RDNA2 GPUs are in both the PlayStation 5 and Xbox Series S/X, which means that every console game that implements RT will be targeting AMD by default. Only time will tell how that muddies the waters.
Resolution upscaling: Getting more from less
There’s still one more important aspect of ray tracing that we haven’t discussed: DLSS (Deep Learning Super Sampling). RT is computationally intensive, so any way to reduce the number of rays cast is extremely helpful. RT solutions already use denoising to help improve performance, but the goal of DLSS is to reduce the number of pixels rendered from the start.
At its core, the idea of DLSS is pretty easy to grasp. Use machine learning to train a deep-learning network on how to upscale and anti-alias games. The trick is that while the training process can be incredibly time-consuming, the inference aspect – using the trained network and running it against frames in a game – is far less demanding. Across the five DLSS 2.0 games we tested, DLSS Quality mode improved performance on the RTX 3060 Ti by 65 percent at 1440p.
For games that support Nvidia’s proprietary tech, Nvidia GPUs enjoy a commanding lead over AMD. Even the RTX 3060 Ti
outperforms the RX 6800 XT by 50 percent on average in games that use the latest DLSS 2.0 implementation – and that’s using the DLSS Quality mode; higher performance DLSS modes are also available.
How does it look? If you take screenshots of native rendering and DLSS rendering and compare them, sometimes DLSS looks better than native with TAA (Temporal Anti-Aliasing); other times it looks perhaps a bit worse. In motion, though, you’d be hard-pressed to tell the difference – except for the fact that DLSS runs at far more palatable frame rates.
AMD is working on an alternative to DLSS – FidelityFX Super Resolution. The thing is, DLSS 2.0 is here, support is integrated into Unreal Engine, and quite a few game developers and publishers have also jumped on the DLSS train. And why wouldn’t they? RT is so demanding that even the mightiest of GPUs can struggle at higher settings, especially at higher resolutions. With all the RT options maxed out, not even the RTX 3090 can maintain 60fps in Fortnite – though it can with DLSS enabled.
Perhaps more important than ray tracing right now, DLSS has very tangible benefits. Games such as Cyberpunk 2077 can even use DLSS without enabling RT, which makes 4K viable on cards like the RTX 3060 Ti. A $900 GPU, running the most anticipated game of 2020 at maxed out settings (minus RT) and 4K at 60fps? Yeah, we didn’t see that coming at the start of the year. The 3060 Ti can also do maxed out RT with DLSS at 1080p and 60fps if you prefer.
Ray-traced mountains to climb
Two years on, and it’s interesting to see what has changed and what hasn’t. Nvidia is faster at RT, and enabling RT in most games is still a great way to tank performance for a modest improvement in visuals. But in the right situations, RT can make a big difference. There’s no way to get high-quality visuals suitable for film without ray tracing or path tracing, and while games continue to improve, we’re a long way off from playing anything that has visuals worthy of a summer Hollywood blockbuster.
One thing that has changed since the first RTX cards launched is the number of games with support for ray tracing. Big publishers may have been first out of the gate, but those first forays into real-time RT feel pretty lackluster compared to some of the latest games. At present, there are currently around two dozen games that use RT in some form, with many more currently in the works.
It’s funny how we can get used to games looking a certain way and be happy with it, and then along comes something new and our expectations change. Control paved the way with a new level of visual fidelity, and Cyberpunk 2077 now joins it, with most other games still trying to catch up. While you can certainly play either game without ray tracing, once you’ve spent some time running around the hallways of the Federal Bureau of Control or Night City in all their RT glory, you miss the improved reflections, lighting, and shadows once they’re gone.
We remember the days of the first programmable shader GPUs, with games like Crysis causing hardware to physically cry out in pain at times. It took many years before that level of graphics became relatively commonplace, and ray tracing will follow the same path. We’re still nowhere near the point where RT hardware has fully penetrated the market. Part of that is thanks to the rise of laptops. 12 percent of all PCs on Steam might have RT-capable GPUs, but if we’re looking purely at laptops, that number will certainly be lower. Consoles getting RT hardware was probably necessary before we’d get to see the tech proliferate.
With the latest GPUs, we now have cards that are significantly faster than two years ago, and future generations will continue the upward climb. In that sense, right now we’re really just trudging through the mist-shrouded foothills of ray tracing, looking up to the summit in the distance. It’s going to take time to get there, but in another two or three generations of graphics hardware we’ll think back fondly on our hike through the valley of rasterisation and conclude that, yes, the ray-traced panorama before us was worth the effort.