TALKING TECH
Maximum PC grills AMD’s Jason Megit on exactly what makes Polaris tick.
AMD on Polaris and next-gen tech.
This year has seen some of the craziest advancements in graphics performance since both manufacturers jumped down from 40nm to 28nm. We sat down with Jason Megit, AMD’s Technical Marketing Manager, to get some insight into why Polaris and its new 14nm architecture is such a revolutionary step for the company.
Maximum PC: Can you tell us a little bit about Polaris’ design process? Where do you start? How does an architecture develop over time? Jason Megit: The design process for Polaris began over three years ago, at which point we spent a great deal of time looking at our current product specifications and trying to project forward into the future and understand the types of use-cases and resulting performance requirements that would be asked of our product in two years time. Our architects and engineers then needed to make sense of how technologies such as manufacturing processes, memory availability, and new input/output standards were going to affect our ability to deliver on certain design characteristics or requirements.
In Polaris’ case, the writing was on the wall for our architects three years ago: 1080p was rapidly becoming the minimum standard gaming resolution for the PC platform. One key technology that we also saw on the horizon was VR. Designing for VR involved choices such as: which display outputs do we need to support? At what bandwidth? How can we reduce latency? What frame rate should we aim for and in what types of workloads? This is where we buckled down and drove towards an understanding of the level of performance we needed to aim for. We hit the mark with Polaris, with the Radeon RX 480 becoming the industry’s first sub-$200 premium VR-capable GPU. MPC: What do you believe is the most innovative feature introduced with Polaris 10 and 11? JM: Polaris includes a couple of new features to ensure higher quality game streams or captures: First, Polaris adds video encode acceleration support for HEVC, supporting up to 1080p @ 240Hz, 1440p @ 120Hz, or 4K @ 60Hz. Polaris is equipped with the encoding horsepower to capture or stream a high-quality and beyond HD game stream.
To complement this worldclass video encoding feature set, Polaris further improves game stream quality through the use of two-pass encoding. Two-pass encoding in Polaris allows for real-time picture-level analysis of the game content you are streaming/capturing resulting in richer quality video captures of your gaming experience. If you’ve ever experienced a game stream going blocky during quick scene transitions, or complex objects like bushes in the distance looking blurry, 2-pass encoding can solve some of the encoding issues that cause these problems. MPC: We’ve been hearing a lot about Asynchronous Compute with QRQ; could you go into detail about how it operates?
JM: AMD’s Graphics Core Next (GCN) architecture can actually handle the balancing of processing within the graphics and compute command queues in three different manners: concurrent compute and graphics execution, compute preemption of graphics, and priority compute (or QRQ). In contrast, the competition only provides developers with one option (compute preemption). This level of choice for asynchronous compute techniques offered to developers by AMD’s GCN Architecture is exactly why applications like the DirectX 12–equipped Ashes oftheSingularity gain so much performance on AMD hardware when comparing to DirectX 11.
Quick Response Queue (QRQ) is the third method that AMD’s GCN exposes to developers for handling the parallel processing of compute and graphics command queues. QRQ allows for developers to give preferential treatment to certain commands in the queue, ensuring they are executed in a timely fashion. A great example of the type of workload that benefits greatly from QRQ is TrueAudio Next. Latency is incredibly important to audio workloads and QRQ gives developers the option to enforce a guaranteed latency on audio workloads that are being executed on the GPU. MPC: Can you explain to us how TrueAudio Next works, and how important it is for Virtual Reality? JM: With the widespread introduction of VR HMDs this year, we are truly looking at an exciting time for audio in realtime immersive content. One of the key reasons why VR is so interesting is because it allows for us to reach new levels of immersion that were impossible with a simple 2D screen on a desktop. That being said, when you increase immersion through new interaction techniques (head tracking) and display techniques (lens, stereo3D, and HMDs) – other areas of sensory input become that much more important.
With AMD TrueAudio Next, we are addressing the need for more immersive audio to complement VR experiences. TrueAudio Next makes use of a feature unique to AMD’s GCN architecture—CU Reservation. This feature allows for a certain number of CU’s (Compute Units) within your GPU to be reserved for the purposes of dealing with a real-time audio queue of commands.
CU Reservation is needed for TrueAudio Next due to the importance of real-time processing for audio. A certain priority needs to be given to the audio queue to ensure that all processing is taken care of in a timely fashion—if you miss a deadline with audio, it makes itself known to your ears through some nasty, glitchridden sound. With this approach to isolating audio workloads from graphics workloads, TrueAudio Next can provide a glitch-free convolution filter with latency as low as 1.33ms (or 64 samples @ 48kHz). MPC: You’re using delta color compression to help reduce overall memory bandwidth through the 256-bit memory bus—how does that impact overall performance? JM: If you consider a typical game world, you may have a lot of green-colored objects at certain portion of each frame (i.e. grass and vegetation), a lot of blues at another portion of the frame (the sky or water), and potentially a lot of blacks or greys in other areas (your character, a car’s dashboard in a racing game, or a weapon your character is wielding). Without DCC, the GPU would have to store the full color information for each pixel in the frame. With DCC, Polaris products are able to take advantage of the fact that colors are more likely to be similar or only gradually change in similar areas of a frame.
So, in AMD’s 4th generation of the GCN architecture, we are able to look at a single frame as a section of separate smaller blocks. The GPU hardware and driver then work together to make intelligent decisions on each of these blocks to assign one color, with all other pixels in that block being defined as the delta from that block’s assigned color. A delta is much easier to store (and therefore move around) due to its reduced size in comparison to storing the full color information. This compression technique is utilizing deltas and not actually changing the source coloring. It is therefore lossless and essentially means that we can move up to 35%* more data across the memory bus. MPC: Arguably this year has seen some of the biggest advancements in performanceper-watt since 2010. Where do you see graphics processors heading in the next five years? JM: I certainly agree with that qualifier. This year has indeed been exciting for those that consider performanceper-watt a key variable in their GPU purchasing decisions. Not only are we able to offer Polaris products at amazing price points, but we are also able to enable new form factors and designs based on reduced thermal requirements.
A major contributor to performance-per-watt metrics with graphics card is the manufacturing process used to create the chip at its heart. Previously, we made a shift from 40nm to 28nm (early 2012) which resulted in impressive gains on the performance-perwatt front. With the RX 480, RX 470 and RX 460, we have shifted from a 28nm process to 14nm FinFET process.
The 14nm FinFET process offers a significant step forward in reducing operating voltages and leakages in comparison to the 28nm process that came before it. The 14nm FinFET process utilizes 3D transistors
to both increase performance whilst reducing the total power required per transistor. The end result is that AMD’s architects were able to pack more transistors into a tight space all while requiring less power per transistor. MPC: With Intel struggling to maintain its Tick-Tock manufacturing process, how difficult is it going to be to transcend below 14nm from a GPU point of view? JM: It’s no secret to those with knowledge about the key metrics in performance and efficiency tied to GPUs that the specific manufacturing process being used is typically the primary influencer in pushing these metrics forward. That being said, major manufacturing changes are not dictated by AMD; we just shifted to the 14nm process for Polaris and previously spent roughly 4 years at 28nm.
I personally can’t wait to see sub-14nm chips and whatever comes after that. But in the meantime, we must think of the ways that we can get the most out of current manufacturing technologies. There are plenty of opportunities in terms of design optimization in hardware and software to ensure this can happen going forward. Our technologies will not stop moving forward, especially given the increased demands that 4K and VR will place on our products over the next couple of years.
From a GPU-specific point of view I am very excited about the massive opportunities for optimization on the software side with our current 14nm products. Technologies such as AMD LiquidVR, DirectX 12, and Vulkan give developers improved access and control over GPUs are making a huge difference already and there is still more to come in the immediate future. MPC: We’ve seen a big push lately towards HDR color. How important do you see that being in the graphical evolution? JM: Constantly driving the industry towards new display technologies and visualization techniques is absolutely paramount to AMD’s Radeon Technology Group’s overall graphics strategy. As a leader in providing the world with graphics hardware, it is in no small part AMD’s responsibility to ensure that new technologies like HDR are well-supported. Without the underlying hardware/software to support new technologies, there will be minimal incentive for display vendors to push these new technologies into their latest hardware.
If you look at the previous decade in terms of display technology, a tremendous amount of focus has been on the increasing overall resolutions and refresh rates. Luminance, in my opinion, has not been given its fair share of the excitement. After seeing a few HDR displays in action last May during Polaris Tech Day in Macau, I truly believe that this is about to change.
Our eyes are able to detect objects based on the photons of light bouncing off of them and hitting our retinas. We are able to differentiate between different objects in ratios, this is often referred to as a contrast ratio in the world of display technologies. Right now, most typical displays are capable of displaying 667:1 contrast ratio, however HDR displays will be capable of increasing this ratio from 20,000:1 and up to 20,000,000:1. MPC: FreeSync seems to be improving quite rapidly, especially with the launch of DisplayPort 1.3 and 1.4. What improvements are being made to avoid ghosting, and other color anomalies? JM: I personally believe that AMD has been at the forefront of display technologies for years, especially considering our history with AMD Eyefinity technology and DisplayPort. Recently, we have made great improvements to the FreeSync ecosystem by enabling an industry-first option for variable refresh rates being driven over an HDMI-linked display. We have accomplished this by implementing vendor-specific extensions to the HDMI standard to add new features. This support will be rolled out across many different HDMI display models, which can be viewed at our FreeSync product page ( http://amd.com/ freesync/). Further, FreeSync now supports Low Framerate Compensation (LFC) to enable smoother gameplay when your games frame rates fall below the minimum refresh rates supported by your FreeSync display. This is supported through an adaptive set of software that gracefully handles sudden drops in frame rates and is automatically enabled on all AMD FreeSync-ready monitors.
In terms of ghosting and other color anomalies, these were typically handled by the scaler vendor specific to your FreeSync-ready panel. Our FreeSync partners have been quick to update their scaler firmware to address this, some have even gone above and beyond and offered further tunings or optimizations. We believe that the panel vendors should be given the control and capabilities necessary to tune their scalers and panels for the best possible performance. This is in my opinion the beauty of an everevolving and open ecosystem such as FreeSync: there are constant improvements and creative new features being introduced that serve to only improve our experience and enjoyment of our display investments. *BasedonAMDinternal memorybandwidthtestas of6/14/2016.RadeonR9 290X:263GB/speakmemory bandwidth.RadeonR9Fury: 333peakGB/swithoutDCC vs.387peakGB/swithDCC. RadeonRX480:186peakGB/s withoutDCCvs.251peakGB/s withDCC.Systemconfiguration: Corei7-6700K,16GBDDR42666,Windows10x64,Radeon Software16.5.2.