THE GRAPHICAL UNION ADDRESS: NOW WITH MORE RAY TRACING

The ray-tracing party is kicking into full swing, with AMD arriving fashionably late. Jarred Walton takes us on the tour.

2021-07-19 -

OVER TWO YEARS AGO, Nvidia surprised nearly everyone by announcing that its nextgen Turing architecture would feature the holy grail of computer graphics: Ray tracing (RT). It’s a technique that’s been around for decades, and the movie industry makes extensive use of it these days—or at least it did, back when movies were still being made, and we could go out to watch them. The thought of getting hyper-realistic visuals in games that could rival the latest blockbusters sounded too good to be true, and it was. Today, we have Nvidia’s second generation of RT hardware in the Ampere architecture, and AMD has joined the fray with its RDNA2 GPUs. It’s time for a State of the Union address on the subject of ray tracing in games.

A BRIEF OVERVIEW OF RT EFFECTS

We’re not going to dig deep into the details of how RT works—that’s familiar territory by now. The short summary is that it involves calculating the projection of lines (rays) into the geometry of a scene to figure out what polygon the ray intersects, and then depending on the material properties of that polygon, additional rays can be cast. Ultimately, the results of all those individual rays are then combined to determine the final color value for a pixel.

While it might seem like a single ray per pixel would suffice, it doesn’t really get the job done. Each bounce counts as a new ray, and there are different types of material and ray calculations that need to be done. The main takeaway is that fully ray-traced scenes in movies can use hundreds or even thousands of rays per pixel, and take hours of rendering time per frame. That’s many orders of magnitude slower than what we want from PC games. Even a relatively “fast” rendering time of 10 minutes per frame in a movie is still about 2 million times slower than a PC game running at 60 frames per second.

The trick for real-time ray tracing is to find a compromise between quality and performance. GPUs containing RT hardware aren’t going to give us a million-fold speed-up, but by combining the results of rasterisation where it looks good enough with RT, it’s possible to get hybrid rendering up to real-time levels of performance. That’s the idea, at least.

In general, rasterisation is really good at handling textures and geometry, and shader calculations and post processing effects can further improve the resulting output. Ray tracing doesn’t need to reinvent the wheel, then, but instead just focus on improving the areas where rasterisation comes up short. Here’s a quick overview of the major RT effects being pursued right now in games.

RAY-TRACED EFFECTS

•Reflections: This is a great example of where RT has real advantages over traditional rendering modes, as the rays can bounce into areas of the scene that aren’t directly visible. It’s easy to set up an environment where the presence or lack of proper reflections makes a huge difference in how things look—2019’s Control being a great example with its mirrored office environments. You can even take things a step further with multiple bounce reflections, where a ray that hits a reflective surface and bounces into another reflective surface keeps propagating, which is required to render a hall of mirrors effect. Of course, more reflections and more rays mean slower performance, but the BrightMemoryInfinite demo has some great examples of how more complex RT modes can improve the overall

visual fidelity.

Right now, the list of games using ray tracing for reflections continues to grow, with at least 10 games released and more on the way. Control, Cyberpunk2077, and WatchDogsLegion are some of the best examples. The problem is that even though reflections can make a big visual difference, there are still scenes where they hardly matter, yet the performance hit remains. Plus, even though screen space reflections miss some things (stuff that’s not currently visible on the screen can’t be reflected), SSR often does a reasonable job at faking it.

Limited RT reflections can also break the illusion the game is trying to create. Case in point: Cyberpunk2077 apparently doesn’t have a model for the player character V present in the game world. That or V is for “vampire” and so V doesn’t cast a reflection. Either way, it feels very weird to walk up to a car window and not see your player. Oddly, you can “turn on” mirrors in bathrooms to see your reflection, and V is present in the game’s photo mode, but after playing Control and WatchDogsLegion, Cyberpunk2077’ s missing reflections just feel wrong—and may end up getting patched by the time you read this. •Shadows: Proper shadows require looking at how much light reaches each surface, which can be incredibly

computationally expensive if there are lots of lights in a scene—not to mention surfaces that can reflect lights. Most games tend to implement RT shadows for only a few light sources, focusing on those that are within a certain distance from the objects, plus a single global light source.

Our impression is that RT shadows are at the other end of the spectrum from reflections, in that they’ve been decidedly disappointing so far. It’s not that RT shadows don’t make a difference in visuals, but do they make a big enough difference? That’s the rub. In practice, shadow-mapping techniques tend to look good, and

more advanced solutions can mimic soft shadows while still performing better than RT shadows. ShadowoftheTomb Raider and the two latest CallofDuty games are good examples of how little RT shadows often matter, and they still generally omit proper shadows for things like procedurally generated foliage. •Global Illumination / Lighting: This one gets a bit nebulous—isn’t lighting sort of the same as shadows? But here the focus is on figuring out how much indirect lighting reaches a surface, as well as the color of the lighting. For a pure RT implementation, it’s possible to basically lump a lot of these effects together in a single unified engine, but

then we’re back to millions of times more complexity than is perhaps necessary for a game.

MetroExodus only used RT for GI, and it could certainly make a difference inside some buildings. One example that was shown when the game launched was how proper GI leads to darker shadows in some areas of a room, but lighter areas in others. A building with a single window to the outside, for instance, doesn’t just have a bright rectangle where light enters the room. Instead, you get a lit-up area that in turn makes the rest of the room at least partially visible. Without RT, most games define a general minimum lighting level so that non-lit areas aren’t just black, but it’s not realistic.

The problem is that sometimes realism isn’t as fun. There’s a scene in a helicopter in Cyberpunk2077, for example, that takes place at night. With RT enabled, all you can really make out is the dark silhouette of a person aiming a mounted minigun. Turn off RT and suddenly there are multiple people and lots of equipment inside the helicopter. It’s not necessarily that the RT implementation is wrong, but in the real world there would likely be other light sources that simply aren’t included in the game.

•Ambient Occlusion, Refractions, and Caustics (oh my!): These last three effects aren’t quite as eye-catching as the above, so we’re putting them all into one pot. Refractions have to do with the way light bends as it passes through a transparent object, such as glass or water. Frankly, a rough approximation of the distortion done via shaders is probably good enough for most games. Caustics is related to refractions, and has to do with the bright focus areas of light that you can get, such as with a magnifying glass or even a glass of wine. Cool? Perhaps, but also not critical for most games. Then finally there’s AO, which focuses on the way shadows tend to be darker in corners where polygons intersect.

The story is pretty much the same for these as for the earlier RT effects: Some things clearly look better, while other aspects hardly seem to change, and the performance hit remains. Can we see the difference between the varying forms of ambient occlusion—SSAO, HBAO+, or RTAO? Yes. Does RTAO look best? Yes. Would we actually notice the change in AO if we weren’t specifically looking for it? Perhaps in a more cerebral game, but in a fast-paced shooter, probably not.

BUILDING FOR THE FUTURE

What we’re getting at is that RT still feels very much in its infancy, as far as games are concerned, and game developers aren’t free to focus on the RT implementation as long as a significant percentage of the PC ecosystem still consists of old non-RT

GPUs. And it does, if we’re going by the Steam Hardware Survey, which shows only 12 percent of all PCs using an Nvidia RTX graphics card. (It also shows 7 percent of PCs surveyed still using Intel integrated graphics.)

This isn’t something new or unusual. We’ve been through this same sort of progression multiple times. There was the shift from 2D to 3D accelerators in the 90s, the “first GPUs” that added hardware transform and lighting support in the late 90s, and multiple generations of hardwareprogrammable shaders. Each time, it took years before the old hardware was truly abandoned by newer games. The past two years of consumer RT hardware have just been the latest case of hardware preceding the software.

The good news is that things are starting to change. Nvidia is no longer the only ray-tracing solution in town. Now that AMD also has hardware RT support for both PC GPUs and the latest consoles, we could potentially see an acceleration in the use of RT. When will it become a required feature? Judging by the way things are progressing, we’re at least five years away from that happening for the biggest releases, and probably more like 10 years away. Individual games might decide to require RT hardware, but it’s going to take a long time before major studios are likely to go all-in on ray tracing.

Part of that is simply a matter of economics. Major game launches have already reached the level of Hollywood movies, with costs in the tens of millions of dollars (or more), and that’s with relatively tame visuals. If we want RT games to look anything like RT movies, it will require even more artistic talent to make RT shine. To recover the costs of creating a game, publishers need to be able to sell as many copies as possible, and that sits in direct opposition to the idea of making a game that requires RT hardware support.

In other words, we have the classic chicken and egg scenario. Game developers want more gamers to have RT-capable hardware before they pour resources into creating RT games. Gamers, on the other hand, want to see some real advantages for RT hardware before they’re willing to fork over the money for a graphics card upgrade. That brings us to the AMD vs. Nvidia (and maybe Intel) discussion.

MEET THE HARDWARE CONTENDERS

Let’s move on from what ray tracing can do for games and look at the hardware implementations. Nvidia has first-gen RTX 20-series GPUs that have defined the baseline for RT performance. The new Ampere architecture ushers in the second round of Nvidia RT hardware, promising up to double the performance for RT calculations. AMD, meanwhile, has just released its first-gen RT hardware, the RX 6000-series RDNA2 GPUs. How do the various GPUs compare in terms of RT capabilities? Well it depends.

Nvidia hasn’t disclosed a lot of the low-level details about how Ampere and Turing RT cores differ from each other. We know that Ampere has an additional ray/triangle intersection functional unit, and Ampere’s RT cores also have the ability to take a time component (useful for things like RT motion blur). While Ampere is theoretically up to twice as fast as Turing per RT core, in practice Nvidia says that it’s about 70 percent faster. Unfortunately, that’s only scratching the surface of what the RT cores do and how they work.

For example, we know that Nvidia’s RT cores can perform ray/ bounding box intersections in addition to ray/ triangle intersections. This is all part of the BVH (Bounding Volume Hierarchy) implementation used for RT in both DirectX Raytracing (DXR) and VulkanRT. In short, BVH is a structure that helps accelerate the process of determining which, if any, triangle a ray intersects. Rather than checking the ray against every triangle, it starts with comparing the ray against bounding boxes that get progressively smaller, until the algorithm reaches a point where checking a ray against individual triangles makes sense.

What we don’t know exactly is how fast Nvidia’s GPUs are at ray/ box vs.

ray/triangle intersections. There are some situations where Ampere can do twice as many RT calculations per cycle, and others where one of the functional units may go unused. For AMD’s part, the RDNA2 GPUs have Ray Accelerators that can do either four ray/ box intersection calculations per cycle, or one ray/triangle intersection. It’s also unclear if different types of RT calculations—for example, for reflections vs. shadows vs. global illumination—take different amounts of time per ray, or if they simply require more rays in general.

With hardware in hand, however, it’s possible to run tests to get a reasonable estimate of the performance—see Pure Ray Tracing Performance. That’s one specific test of RT performance, and it may not be applicable to all RT implementations, but it does provide interesting data points. Which brings us to the final topic: Actual RT performance in currently shipping games.

RT HARDWARE PERFORMANCE

We selected 10 graphics tests and games that use the DirectX Raytracing API and run on both AMD and Nvidia GPUs. That last point is important, because there are currently two games that use DXR and only work on AMD GPUs ( Godfall and TheRiftbreaker), and at least a few tests and games ( Bright MemoryInfinite, Cyberpunk2077, and WolfensteinYoungblood) that only work on Nvidia GPUs. Not surprisingly, each of the games is promoted by the respective GPU company, so the upcoming RT gaming wars could get messy (see table above).

Right now, it looks like Nvidia is crushing AMD in RT performance. Overall, the RTX 3070 comes in just ahead of the 6800 XT, and the 3080 is 25 percent faster than AMD’s RX 6900 XT. As for the 3090, it’s in a class (and price) of its own.

But then we look at Dirt5 and have to wonder how much vendor-specific optimizations play a role. There, the 6800 XT beats the 3080 by 8 percent. Yes, this is an AMD-promoted game, and the DXR is still in beta. Maybe Nvidia will close the gap, and to be fair the RT effects in Dirt5 aren’t particularly impressive. As for the other games, they came out before AMD’s RX 6000 series launched, which means they were optimised for Nvidia by default.

The thing to keep in mind is that AMD RDNA2 GPUs are in both the PlayStation 5 and Xbox Series S/X, which means that every console game that implements RT will be targeting AMD by default. Only time will tell how that muddies the waters.

RESOLUTION UPSCALING: GETTING MORE FROM LESS

There’s still one more important aspect of ray tracing that we haven’t discussed: DLSS (Deep Learning Super Sampling). RT is computationally intensive, so any way to reduce the number of rays cast is extremely helpful. RT solutions already use denoising to help improve performance, but the goal of DLSS is to reduce the number of pixels rendered from the start.

At its core, the idea of DLSS is pretty easy to grasp. Use machine learning to

train a deep-learning network on how to upscale and anti-alias games. The trick is that while the training process can be incredibly time-consuming, the inference aspect—using the trained network and running it against frames in a game—is far less demanding. Across the five DLSS 2.0 games we tested, DLSS Quality mode improved performance on the RTX 3060 Ti by 65 percent at 1440p.

For games that support Nvidia’s proprietary tech, Nvidia GPUs enjoy a commanding lead over AMD. Even the RTX 3060 Ti outperforms the RX 6800 XT by 50 percent on average in games that use the latest DLSS 2.0 implementation—and that’s using the DLSS Quality mode; higher performance DLSS modes are also available.

How does it look? If you take screenshots of native rendering and DLSS rendering and compare them, sometimes DLSS looks better than native with TAA (Temporal Anti-Aliasing); other times it looks perhaps a bit worse. In motion, though, you’d be hard-pressed to tell the difference—except for the fact that DLSS runs at far more palatable frame rates.

AMD is working on an alternative to DLSS—FidelityFX Super Resolution. The thing is, DLSS 2.0 is here, support is integrated into Unreal Engine, and quite a few game developers and publishers have also jumped on the DLSS train. And why wouldn’t they? RT is so demanding that even the mightiest of GPUs can struggle at higher settings, especially at higher resolutions. With all the RT options maxed out, not even the RTX 3090 can maintain 60fps in Fortnite— though it can with DLSS enabled.

Perhaps more important than ray tracing right now, DLSS has very tangible benefits. Games such as Cyberpunk2077 can even use DLSS without enabling RT, which makes 4K viable on cards like the RTX 3060 Ti. A $900 GPU, running the most anticipated game of 2020 at maxed out settings (minus RT) and 4K at 60fps? Yeah, we didn’t see that coming at the start of the year. The 3060 Ti can also do maxed out RT with DLSS at 1080p and 60fps if you prefer.

RAY-TRACED MOUNTAINS TO CLIMB

Two years on, and it’s interesting to see what has changed and what hasn’t. Nvidia is faster at RT, and enabling RT in most games is still a great way to tank performance for a modest improvement in visuals. But in the right situations, RT can make a big difference. There’s no way to get high-quality visuals suitable for film without ray tracing or path tracing, and while games continue to improve, we’re a long way off from playing anything that has visuals worthy of a summer Hollywood blockbuster.

One thing that has changed since the first RTX cards launched is the number of games with support for ray tracing. Big publishers may have been first out of the gate, but those first forays into realtime RT feel pretty lackluster compared to some of the latest games. At present, there are currently around two dozen games that use RT in some form, with many more currently in the works.

It’s funny how we can get used to games looking a certain way and be happy with it, and then along comes something new and our expectations change. Control paved the way with a new level of visual fidelity, and Cyberpunk2077 now joins it, with most other games still trying to catch up. While you can certainly play either game without ray tracing, once you’ve spent some time running around the hallways of the Federal Bureau of Control or Night City in all their RT glory, you miss the improved reflections, lighting, and shadows once they’re gone.

We remember the days of the first programmable shader GPUs, with games like Crysis causing hardware to physically cry out in pain at times. It took many years before that level of graphics became relatively commonplace, and ray tracing will follow the same path. We’re still nowhere near the point where RT hardware has fully penetrated the market. Part of that is thanks to the rise of laptops. 12 percent of all PCs on Steam might have RT-capable GPUs, but if we’re looking purely at laptops, that number will certainly be lower. Consoles getting RT hardware was probably necessary before we’d get to see the tech proliferate.

With the latest GPUs, we now have cards that are significantly faster than two years ago, and future generations will continue the upward climb. In that sense, right now we’re really just trudging through the mist-shrouded foothills of ray tracing, looking up to the summit in the distance. It’s going to take time to get there, but in another two or three generations of graphics hardware we’ll think back fondly on our hike through the valley of rasterisation and conclude that, yes, the ray-traced panorama before us was worth the effort.