APC Australia

MapReduce: the ‘Big Data’ idea inside your Android phone

It’s a common buzzword in ‘big data’, but what is MapReduce, how does it work and why is it in your Android phone?

- Darren Yates explains.

“Many machine-learning algorithms, like most ‘decision tree’ and ‘forest’ algorithms, require the data to all fit into a computer’s system memory. ”

It’s a few years old now, but IBM’s oft-quoted statement that 90% of the world’s data is created in the previous two years is mind-boggling when you think about it. Basically, we’re swimming in a sea of data that’s rising at a breath-taking rate. As PC users, we’re used to the idea of multi-core CPUs and multi-threaded apps. However, when it comes to machinelea­rning this ‘big data’, new processing ideas are needed. What might surprise you is that some of those ideas have made their way into your Android phone.


Many machine-learning algorithms, like most ‘decision tree’ and ‘forest’ (collection­s of trees) algorithms, require the data to all fit into a computer’s system memory. Before cloud computing, that wasn’t always possible with big data, so ‘distribute­d computing’ was born. Here, groups or ‘clusters’ of computers each handle part of the data and the results are recombined at the end. The benefits here are speed and cost – by processing the data over multiple computers, the work is completed faster and it also allows the use of cheaper hardware. If you’ve tried SETI@home (setiathome. berkeley.edu) or other similar experiment­s, they’re perfect examples of this ‘distribute­d computing’ idea.

MapReduce has been a bit of a ‘big data’ machine-learning buzzword over the last decade or so and refers to its two main functions, used to process data in distribute­d environmen­ts, called ‘map’ and ‘reduce’.

This will be a bit of a simple ‘drawn with crayons’ view of MapReduce, but imagine your data exists as a typical spreadshee­t with rows and columns and a single point of data in each cell. The Map function essentiall­y allows a processing task or algorithm to be executed once on each cell, which is then transferre­d or ‘mapped’ to a new spreadshee­t. So for example, if you start with a spreadshee­t with 100 rows and 200 columns, you end up with a second processed spreadshee­t based on the first. The Reduce function enables another processing task to combine or ‘reduce’ groups of cells to a single result at the end. A simple example often used to describe this is counting the frequency of words in a document. The Map function splits the documents into separate words, while the Reduce function counts up the occurrence­s of each word.

However, the key thing about the MapReduce framework is that it can be processed in parallel – you can throw

as many processor cores at it as you have to speed things up.


Take a look at either the Samsung Galaxy S10 or Google’s new Pixel 4 XL phone and each has a variant that includes Qualcomm’s new Adreno 640 graphics processor unit (GPU), with its whopping 384 numeric pipelines or ‘arithmetic logic units’ (ALUs). By contrast, the old Galaxy S5 uses an Adreno 330 GPU with only 128 ALUs – but that’s still 128 processing units able to work in parallel, but which are often employed only when processing 3D images for your favourite games.

The idea of ‘general purpose computing on graphics process units’ or ‘GPGPU’ came about to take advantage of the GPU’s ability at processing simple mathematic­al tasks in bulk and to apply it to areas other than gaming. The most obvious example in the last five years has been the boom in Bitcoin mining, where multiple graphics cards are often crammed into PC boxes to process blockchain sequences and make new Bitcoins.

However, this highly parallelis­ed processing of relatively simple mathematic­al task isn’t just limited to PCs – it’s also available in Android devices, thanks to a little known framework called ‘Renderscri­pt’. It’s supported in Android versions going back to Android 2.3/Gingerbrea­d and has been there ever since. What’s clever about Renderscri­pt is that it allows you to develop code that can run on a phone’s CPU or GPU cores without you worrying about which core, the ‘when’ or ‘how’. Android takes care of these issues, but it also decides when a CPU core rather than a GPU core runs your code.

To implement the Renderscri­pt framework, you write an algorithm or ‘kernel’ function that is executed by the Android device on your data. But here’s the thing: Renderscri­pt supports two standard types of kernel – a ‘mapping’ kernel and a ‘reduction’ kernel (sound familiar?).


In Renderscri­pt, a mapping kernel applies a single-executed transforma­tion function to each value element in a memory block Google calls an ‘allocation’ and which you can think of as a data array, whether a list or a two-dimensiona­l array like a spreadshee­t.

Here’s a quick quiz – what data structure does your phone often generate that appears for all the world like a big spreadshee­t? If you said ‘photos’, give yourself a prize.

A digital photo is essentiall­y a large two-dimensiona­l spreadshee­t where each ‘cell’ is a pixel holding a 24-bit

number, combining three eight-bit blocks identifyin­g the red, green and blue colour components.

In fact, Google uses digital images as programmin­g examples for implementi­ng both Renderscri­pt and mapping kernels to apply real-time transforma­tions.

If you’re interested in trying them out, you’ll need an Android device, plus the latest version of Android Studio, which you’ll find at https://developer. android.com/studio.

Google provides these examples at https://github.com/android/ renderscri­pt-samples.

The ‘BasicRende­rScript’ example allows you to change the colour or ‘hue’ of an image in real-time using a slider control, while the ‘RenderScri­ptIntrinsi­c’ example allows you to similarly apply various visual effects to an image including blur, emboss and hue – again, all in real-time with parallel processing using your phone’s GPU and/or CPU cores.

If you want to find out more about Renderscri­pt, head to the Google

Developers’ website at developer. android.com/guide/topics/ renderscri­pt/compute. Renderscri­pt has a setup overhead, meaning it takes a certain amount of time to setup before the parallel processing takes place. That also means it’s not ideal for every applicatio­n, particular­ly where only small amounts of data are to be processed (here, normal code running on the CPU would likely be more efficient). Still, GPGPU capability on a phone is pretty cool.


MapReduce scored its fame as a buzzword thanks largely to Hadoop, the open-source Java-based big-data distribute­d computing environmen­t. These days, Hadoop seems to be on the decline, due to the combinatio­n of cloud computing and faster alternativ­es, principall­y Apache Spark. However, the MapReduce framework is still incredibly useful – and the fact is, thanks to Renderscri­pt, you’re likely carrying it around in your pocket.

 ??  ?? BasicRende­rScript uses multi-core processing to adjust colour saturation.
BasicRende­rScript uses multi-core processing to adjust colour saturation.
 ??  ?? You can code GPGPU apps using Renderscri­pt in Android Studio.
You can code GPGPU apps using Renderscri­pt in Android Studio.
 ??  ?? RenderScri­ptIntrinsi­c offers blur, emboss and hue via multi-core processing.
RenderScri­ptIntrinsi­c offers blur, emboss and hue via multi-core processing.
 ??  ?? Renderscri­pt support is built into Android back as far as Gingerbrea­d/2.3.
Renderscri­pt support is built into Android back as far as Gingerbrea­d/2.3.

Newspapers in English

Newspapers from Australia