PC Advisor

Google’s AI chips

The TPUs are faster at neural net inference and excel at performanc­e per watt, reveals Blair Hanley Frank


Four years ago, Google was faced with a conundrum: if all its users hit its voice-recognitio­n services for three minutes a day, it would need to double the number of data centres just to handle all of the requests to the machine learning system powering those services.

Rather than buy a bunch of new real estate and servers just for that purpose, the company embarked on a journey to create dedicated hardware for running machinelea­rning applicatio­ns like voice recognitio­n.

The result was the Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks. Google published a paper recently laying out the performanc­e gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performanc­e per watt of power consumed.

A TPU was on average 15- to 30 times faster at the machine learning inference tasks tested than a comparable serverclas­s Intel Haswell CPU or Nvidia K80 GPU. Importantl­y, the performanc­e per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.

Driving this sort of performanc­e increase is important for Google, considerin­g the company’s emphasis on building machine learning applicatio­ns. The gains validate the company’s focus on building machine learning hardware at a time when it’s harder to get massive performanc­e boosts from traditiona­l silicon.

This is more than just an academic exercise. Google has used TPUs in its data centres since 2015 and they’ve been put to use improving the performanc­e of applicatio­ns including translatio­n and image recognitio­n. The TPUs are particular­ly useful when it comes to energy efficiency, which is an important metric related to the cost of using hardware at massive scale.

One of the other key metrics for Google’s purposes is latency, which is where the TPUs excel compared to other silicon options. Norm Jouppi, a distinguis­hed hardware engineer at Google, said that machine learning systems need to respond quickly in order to provide a good user experience.

“The point is, the internet takes time, so if you’re using an internet-based server, it takes time to get from your device to the cloud, it takes time to get back,” Jouppi said. “Networking and various things in the cloud — in the data centre — they takes some time. So that doesn’t leave a lot of [time] if you want near-instantane­ous responses.”

Google tested the chips on six different neural network inference applicatio­ns, representi­ng 95 percent of all such applicatio­ns in Google’s data centres. The applicatio­ns tested include DeepMind AlphaGo, the system that defeated Lee Sedol at Go in a five-game match in 2016.


The company tested the TPUs against hardware that was released around roughly the same time to try and get an applesto-apples performanc­e comparison. It’s possible that newer hardware would at least narrow the performanc­e gap.

There’s still room for TPUs to improve, too. Using the GDDR5 memory that’s present in an Nvidia K80 GPU with the TPU should provide a performanc­e improvemen­t over the existing configurat­ion that Google tested. According to the company’s research, the performanc­e of several applicatio­ns was constraine­d by memory bandwidth.

Furthermor­e, the authors of Google’s paper claim that there’s room for additional software optimisati­on to increase performanc­e. The writers called out one of the tested convolutio­nal neural network applicatio­ns (referred to in the paper as CNN1) as a candidate. However, because of existing performanc­e gains from the use of TPUs, it’s not clear if those optimisati­ons will take place. While neural networks mimic the way neurons transmit informatio­n in humans, CNNs are modelled specifical­ly on how the brain processes visual informatio­n.

“As CNN1 currently runs more than 70 times faster on the TPU than the CPU, the CNN1 developers are already very happy, so it’s not clear whether or when such optimisati­ons would be performed,” the authors wrote.

TPUs are what’s known in chip lingo as an applicatio­n-specific integrated circuit (ASIC). They’re custom silicon built for one task, with an instructio­n set hard-coded into the chip itself. Jouppi said that he wasn’t overly concerned by that, and pointed out that the TPUs are flexible enough to handle changes in machine learning models. “It’s not like it was designed for one model, and if someone comes up with a new model, we’d have to junk our chips or anything like that,” he said.

Google isn’t the only company focused on using dedicated hardware for machine learning. Jouppi added that he knows of several start-ups working in the space, and Microsoft has deployed a fleet of field-programmab­le gate arrays in its data centres to accelerate networking and machine learning applicatio­ns.

 ??  ??

Newspapers in English

Newspapers from United Kingdom