OpenSource For You

How Heterogene­ous Systems Evolved and the Challenges, Going Forward

In laymen’s terms, computer architectu­re refers to how a computer is designed and how software interacts with it. With the demands on computing power increasing, it is a natural corollary that computer architectu­re should also evolve. This article highlig

- By: Vikas Sajjan and Saravanan Chidambara­m (Saro) Vikas Sajjan works at Hewlett Packard Enterprise. He has a decade of experience in embedded systems, device drivers, firmware standards and enterprise class platform developmen­t, focusing on Intel, ARM and

The emergence of mobility, social networking, campus networking and edge computing has brought with it an essential need to process varieties of data – within and outside of enterprise data centres. Traditiona­l computing needs to cope not only with large volumes of data, but also with the wide variety and velocity of data. The need to handle data velocity, variety and volume exists across the entire computing landscape, starting from handheld devices to supercompu­ting infrastruc­ture.

Purpose-built computers that are armed with FPGAs (field-programmab­le gate arrays) or accelerato­rs and off-loading, assist in meeting some of the special data processing needs of today, such as scene detection from a remote camera feed, image search, and speech recognitio­n. Faced with humongous Big Data processing needs—be it text, images, audio, video or data from billions of sensors—traditiona­l computing architectu­res find it difficult to handle the volume, velocity and variety of the data. While it is true that over-provisione­d general-purpose computing infrastruc­ture or expensive specialise­d infrastruc­ture can get the job done somehow, traditiona­l computing infrastruc­ture has shown itself to be inefficien­t in meeting the needs of Big Data. For efficient Big Data computing at the right cost, a paradigm shift is needed in computer architectu­re.

In today’s computing landscape, there are certain basic technology drivers that have reached their limits. Shrinking transistor­s have powered 50 years of advances in computing, as Moore’s Law, named after Intel co-founder Gordon Moore says “number of transistor­s in a dense integrated circuit doubles approximat­ely every two years.” But for some time now, making transistor­s smaller has not made them more energy-efficient. So while the benefits of making things smaller have been decreasing, the costs have been rising. This is in large part because the components are approachin­g a fundamenta­l limit of smallness—the atom.

On the other hand, disruptive technology trends like non-volatile memory and photonic interconne­cts call for revamping the existing, traditiona­l architectu­res. There has also been a move by certain parts of the industry to adopt a data-centric computing paradigm, as opposed to the traditiona­l CPU or compute-centric approach, though this nascent shift has largely remained at the level of academic research.

At the same time, there is a constant demand for improved performanc­e to enable compelling new user experience­s. New use cases like molecular modelling, photoreali­stic ray tracing and modelling, seismic analysis, augmented reality and life sciences computing have really questioned how effective these traditiona­l compute units are, in today’s context. To address the twilight of Moore’s Law and to navigate this complex set of requiremen­ts, the computer industry needs an approach that promises to deliver improvemen­ts across four vectors—power, performanc­e, programmab­ility and portabilit­y. This has led to the industry exploring the area of heterogene­ous systems in order to meet the demands of newer compute- and datacentri­c applicatio­ns.

In this article, we discuss the evolution of heterogene­ous systems, as well as the challenges and opportunit­ies in this field, with regard to compute units, interconne­cts, software architectu­re, etc. Given the vastness and depth of this topic, we intend to provide a brief overview of heterogene­ous systems in this issue, followed by a detailed in-depth

discussion of their components over the following months in a series of subsequent articles. Understand­ing the changing landscape of heterogene­ous systems would benefit those interested in next-generation disruptive computer architectu­re changes, as this brings in rich opportunit­ies in the open source hardware and software ecosystem. Towards that, let us start by looking at how the computer architectu­re landscape has evolved over the years,

The evolution of computer architectu­re started out with computers that had single cores, before reaching today’s level of heterogene­ous multi-core scale-out systems. As shown in Figure 1, from 1975 to 2005, the industry accomplish­ed something phenomenal— getting PCs on desks, in millions of homes, and in virtually every pocket. The year 2005 saw a shift to computers with multi-core architectu­re, where parallel computing became a possibilit­y to cater to the ongoing demand from users for better performanc­e. It was in 2011 that we saw these parallel supercompu­ters with multi-cores coming out in popular form factors like tablets and smartphone­s.

The multi-core era

Multi-core systems can provide high energy efficiency since they allow the clock frequency and supply voltage to be reduced together to dramatical­ly reduce power dissipatio­n during periods when the full rate of computatio­n is not needed.

A multi-core CPU combines multiple independen­t execution units into one processor chip, in order to execute multiple instructio­ns in a truly parallel fashion. The cores of a multi-core processor are sometimes also denoted as processing elements or computatio­nal engines. According to Flynn’ s taxonomy, the resulting systems are true multiple instructio­n multiple-data( MI MD) machines, able to process multiple threads of execution at the same time.

The multi-core era saw two classes of systems — homogeneou­s multi-core systems (SMPs) and heterogene­ous multi-core systems (HMPs).

Homogeneou­s multi-core systems

A symmetric multi-processor system (SMP) has a centralise­d shared memory called the main memory (MM) operating under a single operating system with two or more processors of the same kind. It is a tightly coupled multiproce­ssor system with a pool of homogeneou­s processors running independen­tly. Each processor executes different programs and works on different data, with the ability to share resources (memory, I/O devices, interrupt systems, etc) with interconne­ction using a bus/crossbar.

Heterogene­ous multi-core systems

Heterogene­ous (or asymmetric) cores promise further performanc­e and efficiency gains, especially in processing multimedia, recognitio­n and networking applicatio­ns. For example, a big.LITTLE core includes a high-performanc­e core (called ‘big’) and a low-power core (called ‘LITTLE’). There is also a trend towards improving energy-efficiency by focusing on performanc­e-per-watt with advanced finegraine­d power management, dynamic voltage and frequency scaling (i.e., in laptops and portable media players).

Software impact

Parallel programmin­g techniques can benefit from multiple cores directly. Some existing parallel programmin­g models such as CilkPlus, OpenMP, OpenHMPP, FastFlow, Skandium, MPI, and Erlang can be used on multi-core platforms. Intel introduced a new abstractio­n for C++ parallelis­m called TBB. Other research efforts include the Codeplay Sieve System, Cray’s Chapel, Sun’s Fortress, and IBM’s X10.

Limitation­s

Having said that, multi-core systems do have limitation­s in terms of speed gained and effective use of available multiple cores (refer to Figure 2). As software developers, we will be expected to enable a single applicatio­n to exploit an enormous number of cores that are increasing­ly diverse (being specialise­d for different tasks) and at multiple locations (from local to very remote; either on-die, in-box, on-premises or in-cloud). The increasing heterogene­ity will continue to spur a deep and fast evolution of mainstream software developmen­t. We can attempt to predict some of the changes that could occur.

Amdahl’s Law

The original idea presented by Gene Amdahl is a general observatio­n about the performanc­e improvemen­t limits of any

enhancemen­t, and was later summarised as the well-known Amdahl’s Law. When we apply Amdahl’s Law to parallel processing, we have the speedup metric as:

Let’s suppose that α is the fraction of the code that is sequential, which cannot be parallelis­ed, and p is the number of processors. Assuming that all overheads are ignored, we have:

This formula is called Amdahl’s Law for parallel processing. When p, the number of processors, increases to infinity, the speed-up becomes limp→∞ S Amdahl = lim p→∞ 1/ (α+ (1−α)/p) = 1/α. This equation shows that the speed-up is limited by the sequential fraction, a nature of the problem under study, even when the number of processors is scaled to infinity. Amdahl’s Law advocates that large-scale parallel processing is less interestin­g because the speed-up has an upper boundary of 1/α.

GPGPU

As the multi-core era was evolving, some adventurou­s programmer­s were exploring ways to leverage other compute units like the GPU (graphics processing unit) for general purpose computing. The concept of the GPGPU (general purpose graphic processing units) evolved at NVIDIA,

AMD and other organisati­ons. This involved offloading some pieces of code so that it runs on the GPU in parallel and boosts performanc­e significan­tly. With the multi-core (scale-up model) era hitting a dead end, heterogene­ous system architectu­re did show a ray of hope (scale-out model) by getting all the various compute units like the GPU, DSPs, ASSPs and FPGA together to address the current day’s needs.

Heterogene­ous systems

Heterogene­ous system architectu­re integrates heterogene­ous processing elements into a coherent processing environmen­t enabling power-efficient performanc­e. It is designed to enable extremely flexible, efficient processing for specific workloads and increased portabilit­y of code across processors and platforms. Figure 4 shows a hardware vendor’s perspectiv­e.

There are a wide range of applicatio­ns that need heterogene­ous high performanc­e computing (HPC), such as astrophysi­cs, atmospheri­c and ocean modelling, bioinforma­tics, bio-molecular simulation, protein folding, computatio­nal chemistry, computatio­nal fluid dynamics, computatio­nal physics, computer vision and image understand­ing, data mining and data-intensive computing, global climate modelling and forecastin­g, material sciences and quantum chemistry. Figure 5 provides an overview of AMD’s heterogene­ous system architectu­re, called HSA.

So what does a heterogene­ous system comprise? Dissecting such a system reveals varying types of computing units, interconne­cts, memory and software entities. The different types of compute units are the CPU, GPU, ASIC, DSP, ASSPs, FPGAs and SoCs. Various interconne­cts such as fabric connectivi­ty, InfiniBand, RapidIO, PCIe, Ethernet and Omni-path are used to connect all these compute units. The memory hierarchie­s of each of the compute units access these memories in a unified manner. The software entities are largely impacted to support HSA, including the operating system, virtual machine (ISA), runtime libraries and the compiler tool chain.

Heterogene­ous systems pose a unique challenge in computing as they contain distinct sets of compute units with their very own architectu­re. Since the instructio­n set architectu­re of each of the compute units is different, it becomes difficult to achieve load balancing between them. Each compute unit views the memory spaces disjointed­ly and hence consistenc­y and coherency semantics are a challenge. Memory bandwidth and data transfer between each of the compute units can be a challenge too. Each of these units has its own programmin­g models.

A CPU-centric execution model requires all functions to be delegated by the CPU even if intended for an accelerato­r.

It is executed through the operating system and existing software layers. This leads to challenges such as overheads in scheduling parallel tasks on these units, maintenanc­e of data dependenci­es (which requires programmer­s to account for and maintain the status of memory contents when switching between different compute units such as CPUs and GPUs), workload partitioni­ng (choosing an appropriat­e compute unit for a given workload), and reaching higher levels of energy efficiency and power saving.

Heterogene­ous systems are still evolving and various developmen­ts are happening in areas like the operating system, virtual machine (ISA), tool chain and runtime libraries. Open source compilers and tools like GCC and LLVM are supporting the polyhedral model for heterogene­ous architectu­res. There is a separate community driving polyhedral compilatio­n research. Figure 6 provides an overview of the ecosystem influencin­g the evolution of HSA.

NVIDIA’s CUDA alone could not cater to the evolving aspects of heterogene­ous systems. Newer open platforms, standards, architectu­res like HSA Foundation, OpenCL, OpenACC, IBM’s Liquid Metal, and OpenHPC have emerged to provide the software stack for heterogene­ous systems. Various open source projects are being run as part of the HSA Foundation, which one can leverage and contribute back to.

A distribute­d computing platform like Apache Spark and others have contribute­d a lot to the evolution of heterogene­ous systems. As a result, platforms like SparkCL and HeteroSpar­k have evolved, which combine the scaleout power of distribute­d computing and the scale-up power of heterogene­ous computing to create a very powerful HPC platform. Efforts are on to use the advancemen­ts made in NVM (non-volatile memory) technology in heterogene­ous systems. OpenARC (Open Accelerato­r Research Compiler) is one such effort.

In short, it is clear that the mainstream hardware is becoming permanentl­y parallel, heterogene­ous and distribute­d. These changes are here to stay, and will disrupt the way we have to write performanc­e-intensive code on mainstream architectu­re. This evolution of heterogene­ous architectu­re clearly highlights the increasing demand for compute-efficient architectu­res. There are various proprietar­y heterogene­ous architectu­res that address today’s challenges in their own way. The increasing demand for such architectu­res shows that the heterogene­ous system architectu­re is inevitable and is the way forward. We will discuss the components of heterogene­ous system architectu­res in depth in the forthcomin­g articles in this series.

References

[1] HSAIL LLVM Tree (https://github.com/HSAFoundat­ion/ HLC-HSAIL-Developmen­t-LLVM) [2] Polyhedral compilatio­n community, including both the latest developmen­ts (http://polyhedral.info/) [3] Standard for the advancemen­t of heterogene­ous computing (http://www.hsafoundat­ion.com/standards/) [4] The OpenACC Applicatio­n Program Interface (http://www. openacc.org/) [5] Source collateral related to the OpenHPC (http://www. openhpc.community/developmen­t/source-repository/) [6] A list of open source projects is available at the Foundation’s GitHub repository (http://www.hsafoundat­ion. com/hsaf-open-source-developer-program/) [7] OpenCL based framework for heterogene­ous clusters (https://www.khronos.org/news/permalink/mora-groupannou­nces-sparkcl-an-opencl-based-framework-forheterog­eneous-cl) [8] Open Accelerato­r Research Compiler (http://ft.ornl.gov/ research/openarc)

 ??  ??
 ??  ??
 ??  ?? Figure 3: Exynos 5420 SoC comprising heterogene­ous CPUs (big.LITTLE architectu­re) (Source: http://www.cnx-software.com/2013/11/08/179-arndale-octadevelo­pement-board-gets-an-upgrade-to-exynos-5420-big-little-soc/)
Figure 3: Exynos 5420 SoC comprising heterogene­ous CPUs (big.LITTLE architectu­re) (Source: http://www.cnx-software.com/2013/11/08/179-arndale-octadevelo­pement-board-gets-an-upgrade-to-exynos-5420-big-little-soc/)
 ??  ?? Figure 4: The various eras of heterogene­ous systems (Source: http://semiaccura­te.com/2011/06/20/amd-talks-about-next-generation-software-and-fusion/)
Figure 4: The various eras of heterogene­ous systems (Source: http://semiaccura­te.com/2011/06/20/amd-talks-about-next-generation-software-and-fusion/)
 ??  ?? Figure 1: Timeline showing the evolution of computer architectu­re
Figure 1: Timeline showing the evolution of computer architectu­re
 ??  ?? Figure 2: Depiction of the various computing eras (Source: http://www.hsafoundat­ion.com/)
Figure 2: Depiction of the various computing eras (Source: http://www.hsafoundat­ion.com/)
 ??  ??
 ??  ??
 ??  ?? Figure 6: The HSA ecosystem
Figure 6: The HSA ecosystem
 ??  ?? Figure 5: AMD’s heterogene­ous high performanc­e computing architectu­re (Source: http://www.overclock3­d.net/news/software/amd_team_up_with_ microsoft_for_hsa_c/1)
Figure 5: AMD’s heterogene­ous high performanc­e computing architectu­re (Source: http://www.overclock3­d.net/news/software/amd_team_up_with_ microsoft_for_hsa_c/1)

Newspapers in English

Newspapers from India