A new white paper from Google details the company’s use of optical circuit switches in its machine learning training supercomputer, saying that the TPU v4 model with those switches in place offers improved performance and more energy efficiency than general-use processors.
Google’s Tensor Processing Units — the basic building blocks of the company’s AI supercomputing systems — are essentially ASICs, meaning that their functionality is built in at the hardware level, as opposed to the general use CPUs and GPUs used in many AI training systems. The white paper details how, by interconnecting more than 4,000 TPUs through optical circuit switching, Google has been able to achieve speeds 10 times faster than previous models while consuming less than half as much energy.
Aiming for AI performance, price breakthroughs
The key, according to the white paper, is in the way optical circuit switching (performed here by switches of Google’s own design) enables dynamic changes to interconnect topology of the system. Compared to a system like Infiniband, which is commonly used in other HPC areas, Google says that its system is cheaper, faster and considerably more energy efficient.
“Two major architectural features of TPU v4 have small cost but outsized advantages,” the paper said. “The SparseCore [data flow processors] accelerates embeddings of [deep learning] models by 5x-7x by providing a dataflow sea-of-cores architecture that allows embeddings to be placed anywhere in the 128 TiB physical memory of the TPU v4 supercomputer.”
According to Peter Rutten, research vice president at IDC, the efficiencies described in Google’s paper are in large part due to the inherent characteristics of the hardware being used — well-designed ASICs are almost by definition better suited to their specific task than general use processors trying to do the same thing.
“ASICs are very performant and energy efficient,” he said. “If you hook them up to optical circuit switches where you can dynamically configure the network topology, you have a very fast system.”
While the system described in the white paper is only for Google’s internal use at this point, Rutten noted that the lessons of the technology involved could have broad applicability for machine learning training.
“I would say it has implications in the sense that it offers them a sort of best practices scenario,” he said. “It’s an alternative to GPUs, so in that sense it’s definitely an interesting piece of work.”
Google-Nvidia comparison is unclear
While Google also compared TPU v4’s performance to systems using Nvidia’s A100 GPUs, which are common HPC components, Rutten noted that Nvidia has since released much faster H100 processors, which may shrink any performance difference between the systems.
“They’re comparing it to an older-gen GPU,” he said. “But in the end it doesn’t really matter, because it’s Google’s internal process for developing AI models, and it works for them.”