A brief introduction to AMD Ryzen and Sense MI
A brief introduction to Sense MI and the Ryzen CPU.
Any high-level overview of AMD’s Ryzen processors will mention a set five of technologies collectively known as Sense MI. These include Pure Power, Precision Boost, eXtended Frequency Range (XFR), Neural Net Prediction and Smart Prefetch.
Pure Power and Precision Boost
Pure Power and Precision Boost control voltage and operating frequency of the Ryzen processor. This concept isn’t new as every modern processor features some form of voltage and boost frequency range to optimize performance-per-watt depending on workload.
What makes Ryzen different however is how finegrained this control has become. AMD claims Ryzen has more than a thousand embedded sensors distributed per core complex that continuously analyzes power, operating temperature and frequency. Ryzen processors won’t just be able to respond to varying workload needs with greater efficiency (as fast as 1ms), but with finer granularity as well (25MHz adjustments compared to 100MHz of past generations). The effect of Pure Power and Precision Boost combined is to better optimize performanceper-watt of a Ryzen CPU at any workload with a goal to reduce power consumption while maintaining similar performance compared to previous architectures.
eXtended Frequency Range
All modern processors have what’s known as the base clock and boost clock, its maximum rated frequency. The Ryzen 7 1800X for example, has a base and boost clock of 3.60GHz of 4.00GHz respectively. However, if the CPU detects that it is continuously operating under a certain temperature threshold, which would indicate a high-performance custom cooling solution is being used (such as water cooling), an additional reserve frequency beyond the CPU’s maximum boost clock is unlocked. In the case of the Ryzen 7 1800X, that’s an additional 100MHz. Depending on how you look at it, XFR is either the lazy man’s overclocking mode or the processor thanking you for using better cooling.
All three features, Pure Power, Precision Boost and XFR are highly integrated, automatic and fully controlled by the CPU. What this means for the average consumer is that Ryzen CPUs will perform best if you just leave it alone. AMD even
Simultaneous Multi Threading and the Infinity Fabric
Ryzen is the first AMD processor to support proper Simultaneous Multi Threading (SMT), which is the execution of two threads per core in the same way Intel does Hyper Threading. In the previous Bulldozer architecture, AMD preached Clustered Multi Threading (CMT), where a core featured two physical integer units and one shared floating point unit.
It is interesting to point out that physically, Ryzen is packaged with a CPU Complex (CCX) structure. Each CCX features 4 cores, with their own private 512KB L2 cache and a shared 8MB L3. Now, when you look at the Ryzen 7 1800X for example, you’ll notice that it is an 8-core processor with 16MB L3. This means it features two CCXes connected at the SoC level via AMD’s custom Infinity Fabric interconnect, which is based on an enhanced coherent HyperTransport protocol.
AMD claims that with its Infinity Fabric design, they’re able to scale not just processor cores but multi-socket configurations with almost linear performance gain per core increase. An example of this claim is the recently announced 32-core, 64-thread Naples server processor based on the Zen microarchitecture. recommends that Windows’ power profile be set to High Performance instead of the default Balanced mode to give full hardware control over to the CPU.
For power users however, Pure Power, Precision Boost and XFR won’t mean much because any attempt at manual overclocking will override and disable them anyway.
Neural Net Prediction and Smart Prefetch
AMD claims that each Ryzen CPU has a “true artificial intelligence neural network” for better instruction prediction. This is a little clever marketing on AMD’s behalf, but this neural network is essentially the CPU’s branch predictor, albeit a much smarter one based on hashed perceptrons. Ryzen can predict two branches per cycle, and the penalty for incorrect prediction has been improved by 3 cycles with the introduction of a microop cache. The addition of a deeper 3-level Translation Lookaside Buffer (TLB) and a 0 cycle Recent Predictor means that prefetched instructions get loaded faster as well. The entire cache system has been redesigned to be much faster and more efficient overall (approximately double the bandwidth for L1 and L2, and up to 5X total bandwidth for L3).