MEET GOOGLE’S TPU V4, KIND OF
Google is another company that has put a huge effort into AI research. Its search engine continues to be the best in the world, and its DeepMind technologies have created several impressive tools, including AlphaGo, the first AI to beat one of the best humans at Go— a feat some thought would be impossible for the foreseeable future. In 2016, AlphaGo took down one of the top players, who retired in 2019 declaring AI to be “an entity that cannot be defeated.”
Google created its first custom Tensor Processing Unit (TPU) back in 2015, with a performance of 23 teraops — Google appears to have focused on INT8 rather than floating-point formats, so it’s teraops instead of teraflops. It used the TPU internally in its data centers to power AlphaGo, AlphaZero, parsing of text for Google Street View, and more. It made later-generation TPUs available via its Google Cloud platform.
Last year at Google I/O, CEO Sundar Pichai spoke about the fourth generation of TPU, stating that it provides twice the performance of TPU v4. In its data centers, Google has deployed TPU v4 pods, each with 4096 TPU v4 chips, but it has never divulged the low-level specs. TPU v3 reportedly offers 90 teraops of performance, however, so TPU v4 should be roughly double that — at less than half the power consumption. That’s less powerful than Nvidia’s A100, never mind the H100, but again Google isn’t providing concrete details yet.
One advantage of the TPU is that it completely omits support for technologies that Nvidia continues to include in its GPUs.
There’s no support for FP64, or even FP32. The chips are focused on accelerating Google’s TensorFlow software and little else. It may not work well on all algorithms, but considering Google’s expertise in AI research, the TPU appears quite competitive.
Google also offers an Edge TPU with power use of just 2W and performance of 4 teraops. It’s used in the Pixel 4 smartphone and is available in Edge computing devices. Rather than focusing on training AI models, the Edge TPU focuses on inference—running an already trained model.