MapRe­duce: the ‘Big Data’ idea in­side your An­droid phone

It’s a com­mon buzz­word in ‘big data’, but what is MapRe­duce, how does it work and why is it in your An­droid phone?

APC Australia - - Contents - Dar­ren Yates ex­plains.

“Many ma­chine-learn­ing al­go­rithms, like most ‘de­ci­sion tree’ and ‘for­est’ al­go­rithms, re­quire the data to all fit into a com­puter’s sys­tem mem­ory. ”

It’s a few years old now, but IBM’s oft-quoted statement that 90% of the world’s data is cre­ated in the pre­vi­ous two years is mind-bog­gling when you think about it. Ba­si­cally, we’re swim­ming in a sea of data that’s ris­ing at a breath-tak­ing rate. As PC users, we’re used to the idea of multi-core CPUs and multi-threaded apps. How­ever, when it comes to ma­chine­learn­ing this ‘big data’, new pro­cess­ing ideas are needed. What might sur­prise you is that some of those ideas have made their way into your An­droid phone.

WHAT IS MAPRE­DUCE?

Many ma­chine-learn­ing al­go­rithms, like most ‘de­ci­sion tree’ and ‘for­est’ (col­lec­tions of trees) al­go­rithms, re­quire the data to all fit into a com­puter’s sys­tem mem­ory. Be­fore cloud com­put­ing, that wasn’t al­ways pos­si­ble with big data, so ‘dis­trib­uted com­put­ing’ was born. Here, groups or ‘clus­ters’ of com­put­ers each han­dle part of the data and the re­sults are re­com­bined at the end. The ben­e­fits here are speed and cost – by pro­cess­ing the data over mul­ti­ple com­put­ers, the work is com­pleted faster and it also al­lows the use of cheaper hard­ware. If you’ve tried SETI@home (se­tiath­ome. berke­ley.edu) or other sim­i­lar ex­per­i­ments, they’re per­fect examples of this ‘dis­trib­uted com­put­ing’ idea.

MapRe­duce has been a bit of a ‘big data’ ma­chine-learn­ing buzz­word over the last decade or so and refers to its two main func­tions, used to process data in dis­trib­uted en­vi­ron­ments, called ‘map’ and ‘re­duce’.

This will be a bit of a sim­ple ‘drawn with crayons’ view of MapRe­duce, but imag­ine your data ex­ists as a typ­i­cal spread­sheet with rows and col­umns and a sin­gle point of data in each cell. The Map func­tion es­sen­tially al­lows a pro­cess­ing task or al­go­rithm to be ex­e­cuted once on each cell, which is then trans­ferred or ‘mapped’ to a new spread­sheet. So for ex­am­ple, if you start with a spread­sheet with 100 rows and 200 col­umns, you end up with a se­cond pro­cessed spread­sheet based on the first. The Re­duce func­tion en­ables an­other pro­cess­ing task to com­bine or ‘re­duce’ groups of cells to a sin­gle re­sult at the end. A sim­ple ex­am­ple of­ten used to de­scribe this is count­ing the fre­quency of words in a doc­u­ment. The Map func­tion splits the doc­u­ments into sep­a­rate words, while the Re­duce func­tion counts up the oc­cur­rences of each word.

How­ever, the key thing about the MapRe­duce frame­work is that it can be pro­cessed in par­al­lel – you can throw

as many pro­ces­sor cores at it as you have to speed things up.

SO, WHAT ABOUT AN­DROID?

Take a look at ei­ther the Sam­sung Gal­axy S10 or Google’s new Pixel 4 XL phone and each has a vari­ant that in­cludes Qual­comm’s new Adreno 640 graph­ics pro­ces­sor unit (GPU), with its whop­ping 384 nu­meric pipe­lines or ‘arith­metic logic units’ (ALUs). By con­trast, the old Gal­axy S5 uses an Adreno 330 GPU with only 128 ALUs – but that’s still 128 pro­cess­ing units able to work in par­al­lel, but which are of­ten em­ployed only when pro­cess­ing 3D im­ages for your favourite games.

The idea of ‘gen­eral pur­pose com­put­ing on graph­ics process units’ or ‘GPGPU’ came about to take ad­van­tage of the GPU’s abil­ity at pro­cess­ing sim­ple math­e­mat­i­cal tasks in bulk and to ap­ply it to ar­eas other than gam­ing. The most ob­vi­ous ex­am­ple in the last five years has been the boom in Bit­coin min­ing, where mul­ti­ple graph­ics cards are of­ten crammed into PC boxes to process blockchain se­quences and make new Bit­coins.

How­ever, this highly par­al­lelised pro­cess­ing of rel­a­tively sim­ple math­e­mat­i­cal task isn’t just lim­ited to PCs – it’s also avail­able in An­droid de­vices, thanks to a lit­tle known frame­work called ‘Ren­der­script’. It’s sup­ported in An­droid ver­sions go­ing back to An­droid 2.3/Gin­ger­bread and has been there ever since. What’s clever about Ren­der­script is that it al­lows you to de­velop code that can run on a phone’s CPU or GPU cores with­out you wor­ry­ing about which core, the ‘when’ or ‘how’. An­droid takes care of th­ese is­sues, but it also de­cides when a CPU core rather than a GPU core runs your code.

To im­ple­ment the Ren­der­script frame­work, you write an al­go­rithm or ‘ker­nel’ func­tion that is ex­e­cuted by the An­droid de­vice on your data. But here’s the thing: Ren­der­script sup­ports two stan­dard types of ker­nel – a ‘map­ping’ ker­nel and a ‘re­duc­tion’ ker­nel (sound fa­mil­iar?).

HOW AN­DROID USES MAPRE­DUCE

In Ren­der­script, a map­ping ker­nel ap­plies a sin­gle-ex­e­cuted trans­for­ma­tion func­tion to each value el­e­ment in a mem­ory block Google calls an ‘al­lo­ca­tion’ and which you can think of as a data ar­ray, whether a list or a two-dimensiona­l ar­ray like a spread­sheet.

Here’s a quick quiz – what data struc­ture does your phone of­ten gen­er­ate that ap­pears for all the world like a big spread­sheet? If you said ‘pho­tos’, give your­self a prize.

A dig­i­tal photo is es­sen­tially a large two-dimensiona­l spread­sheet where each ‘cell’ is a pixel hold­ing a 24-bit

num­ber, com­bin­ing three eight-bit blocks iden­ti­fy­ing the red, green and blue colour com­po­nents.

In fact, Google uses dig­i­tal im­ages as pro­gram­ming examples for im­ple­ment­ing both Ren­der­script and map­ping ker­nels to ap­ply real-time trans­for­ma­tions.

If you’re in­ter­ested in try­ing them out, you’ll need an An­droid de­vice, plus the lat­est ver­sion of An­droid Stu­dio, which you’ll find at https://de­vel­oper. an­droid.com/stu­dio.

Google pro­vides th­ese examples at https://github.com/an­droid/ ren­der­script-sam­ples.

The ‘Ba­sicRen­derScript’ ex­am­ple al­lows you to change the colour or ‘hue’ of an image in real-time us­ing a slider con­trol, while the ‘Ren­derScrip­tIn­trin­sic’ ex­am­ple al­lows you to sim­i­larly ap­ply var­i­ous vis­ual ef­fects to an image in­clud­ing blur, em­boss and hue – again, all in real-time with par­al­lel pro­cess­ing us­ing your phone’s GPU and/or CPU cores.

If you want to find out more about Ren­der­script, head to the Google

De­vel­op­ers’ web­site at de­vel­oper. an­droid.com/guide/top­ics/ ren­der­script/com­pute. Ren­der­script has a setup over­head, mean­ing it takes a cer­tain amount of time to setup be­fore the par­al­lel pro­cess­ing takes place. That also means it’s not ideal for ev­ery ap­pli­ca­tion, par­tic­u­larly where only small amounts of data are to be pro­cessed (here, nor­mal code run­ning on the CPU would likely be more ef­fi­cient). Still, GPGPU ca­pa­bil­ity on a phone is pretty cool.

MORE THAN ONE USE

MapRe­duce scored its fame as a buzz­word thanks largely to Hadoop, the open-source Java-based big-data dis­trib­uted com­put­ing en­vi­ron­ment. Th­ese days, Hadoop seems to be on the de­cline, due to the com­bi­na­tion of cloud com­put­ing and faster al­ter­na­tives, prin­ci­pally Apache Spark. How­ever, the MapRe­duce frame­work is still in­cred­i­bly use­ful – and the fact is, thanks to Ren­der­script, you’re likely car­ry­ing it around in your pocket.

Ba­sicRen­derScript uses multi-core pro­cess­ing to ad­just colour sat­u­ra­tion.

You can code GPGPU apps us­ing Ren­der­script in An­droid Stu­dio.

Ren­derScrip­tIn­trin­sic of­fers blur, em­boss and hue via multi-core pro­cess­ing.

Ren­der­script sup­port is built into An­droid back as far as Gin­ger­bread/2.3.

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.