Google’s AI chips

The TPUs are faster at neu­ral net in­fer­ence and ex­cel at per­for­mance per watt, re­veals Blair Han­ley Frank

PC Advisor - - CONTENTS -

Four years ago, Google was faced with a conundrum: if all its users hit its voice-recog­ni­tion ser­vices for three min­utes a day, it would need to dou­ble the num­ber of data cen­tres just to han­dle all of the re­quests to the ma­chine learn­ing sys­tem pow­er­ing those ser­vices.

Rather than buy a bunch of new real es­tate and servers just for that pur­pose, the com­pany em­barked on a jour­ney to cre­ate ded­i­cated hard­ware for run­ning ma­chine­learn­ing ap­pli­ca­tions like voice recog­ni­tion.

The re­sult was the Ten­sor Pro­cess­ing Unit (TPU), a chip that is de­signed to ac­cel­er­ate the in­fer­ence stage of deep neu­ral net­works. Google pub­lished a pa­per re­cently lay­ing out the per­for­mance gains the com­pany saw over com­pa­ra­ble CPUs and GPUs, both in terms of raw power and the per­for­mance per watt of power con­sumed.

A TPU was on av­er­age 15- to 30 times faster at the ma­chine learn­ing in­fer­ence tasks tested than a com­pa­ra­ble server­class In­tel Haswell CPU or Nvidia K80 GPU. Im­por­tantly, the per­for­mance per watt of the TPU was 25 to 80 times bet­ter than what Google found with the CPU and GPU.

Driv­ing this sort of per­for­mance in­crease is im­por­tant for Google, con­sid­er­ing the com­pany’s em­pha­sis on build­ing ma­chine learn­ing ap­pli­ca­tions. The gains val­i­date the com­pany’s fo­cus on build­ing ma­chine learn­ing hard­ware at a time when it’s harder to get mas­sive per­for­mance boosts from tra­di­tional sil­i­con.

This is more than just an aca­demic ex­er­cise. Google has used TPUs in its data cen­tres since 2015 and they’ve been put to use im­prov­ing the per­for­mance of ap­pli­ca­tions in­clud­ing trans­la­tion and im­age recog­ni­tion. The TPUs are par­tic­u­larly use­ful when it comes to en­ergy ef­fi­ciency, which is an im­por­tant met­ric re­lated to the cost of us­ing hard­ware at mas­sive scale.

One of the other key met­rics for Google’s pur­poses is la­tency, which is where the TPUs ex­cel com­pared to other sil­i­con op­tions. Norm Jouppi, a dis­tin­guished hard­ware en­gi­neer at Google, said that ma­chine learn­ing sys­tems need to re­spond quickly in or­der to pro­vide a good user ex­pe­ri­ence.

“The point is, the in­ter­net takes time, so if you’re us­ing an in­ter­net-based server, it takes time to get from your de­vice to the cloud, it takes time to get back,” Jouppi said. “Net­work­ing and var­i­ous things in the cloud — in the data cen­tre — they takes some time. So that doesn’t leave a lot of [time] if you want near-in­stan­ta­neous re­sponses.”

Google tested the chips on six dif­fer­ent neu­ral net­work in­fer­ence ap­pli­ca­tions, rep­re­sent­ing 95 per­cent of all such ap­pli­ca­tions in Google’s data cen­tres. The ap­pli­ca­tions tested in­clude Deep­Mind Al­phaGo, the sys­tem that de­feated Lee Sedol at Go in a five-game match in 2016.

Per­for­mance

The com­pany tested the TPUs against hard­ware that was re­leased around roughly the same time to try and get an ap­plesto-ap­ples per­for­mance com­par­i­son. It’s pos­si­ble that newer hard­ware would at least nar­row the per­for­mance gap.

There’s still room for TPUs to im­prove, too. Us­ing the GDDR5 memory that’s present in an Nvidia K80 GPU with the TPU should pro­vide a per­for­mance im­prove­ment over the ex­ist­ing con­fig­u­ra­tion that Google tested. Ac­cord­ing to the com­pany’s re­search, the per­for­mance of sev­eral ap­pli­ca­tions was con­strained by memory band­width.

Fur­ther­more, the au­thors of Google’s pa­per claim that there’s room for ad­di­tional soft­ware op­ti­mi­sa­tion to in­crease per­for­mance. The writ­ers called out one of the tested con­vo­lu­tional neu­ral net­work ap­pli­ca­tions (re­ferred to in the pa­per as CNN1) as a can­di­date. How­ever, be­cause of ex­ist­ing per­for­mance gains from the use of TPUs, it’s not clear if those op­ti­mi­sa­tions will take place. While neu­ral net­works mimic the way neu­rons trans­mit in­for­ma­tion in hu­mans, CNNs are mod­elled specif­i­cally on how the brain pro­cesses vis­ual in­for­ma­tion.

“As CNN1 cur­rently runs more than 70 times faster on the TPU than the CPU, the CNN1 de­vel­op­ers are al­ready very happy, so it’s not clear whether or when such op­ti­mi­sa­tions would be per­formed,” the au­thors wrote.

TPUs are what’s known in chip lingo as an ap­pli­ca­tion-spe­cific in­te­grated cir­cuit (ASIC). They’re cus­tom sil­i­con built for one task, with an in­struc­tion set hard-coded into the chip it­self. Jouppi said that he wasn’t overly con­cerned by that, and pointed out that the TPUs are flex­i­ble enough to han­dle changes in ma­chine learn­ing mod­els. “It’s not like it was de­signed for one model, and if some­one comes up with a new model, we’d have to junk our chips or any­thing like that,” he said.

Google isn’t the only com­pany fo­cused on us­ing ded­i­cated hard­ware for ma­chine learn­ing. Jouppi added that he knows of sev­eral start-ups work­ing in the space, and Mi­crosoft has de­ployed a fleet of field-pro­gram­mable gate ar­rays in its data cen­tres to ac­cel­er­ate net­work­ing and ma­chine learn­ing ap­pli­ca­tions.

Newspapers in English

Newspapers from UK

© PressReader. All rights reserved.