© 2024 AIDIGITALX. All Rights Reserved.

Google Unveils AI Supercomputer Powered by 4th Gen TPU for Training Large Language Models

ai supercomputer

On Tuesday, Google, a subsidiary of Alphabet Inc., unveiled new information about the ai supercomputers it utilizes to train its artificial intelligence models. According to Google, these systems are faster and more power-efficient than Nvidia Corp’s comparable systems. To achieve this, Google has created its own customized chip, named the Tensor Processing Unit (TPU), which it employs for over 90% of its artificial intelligence training work. This involves feeding data through models to make them practical for various tasks, such as generating human-like text responses or images.

Advertisement

Google’s 4th-Gen TPU Chips Connected to Create AI Supercomputer

Google has now released the fourth generation of its TPU. On Tuesday, Google published a scientific paper that explains how they have connected more than 4,000 of these chips into a supercomputer. They accomplished this by developing their own custom optical switches to improve connectivity between the individual machines. Improving these connections has become a crucial area of competition among companies that build AI supercomputer. This is because the size of large language models, such as Google’s Bard or OpenAI’s ChatGPT, has grown significantly, making it impossible to store them on a single chip.

Instead of being trained on a single chip, the models need to be divided across thousands of chips and work together for several weeks or more. Google’s PaLM model, which is their biggest publicly disclosed language model to date, was trained by distributing it across two ai supercomputers, each equipped with 4,000 chips, and took 50 days to complete.

Advertisement

According to Google, their supercomputers are designed to facilitate easy reconfiguration of connections between chips, which helps prevent issues and allows for fine-tuning to improve performance.

Google Fellow Norm Jouppi and Google Distinguished Engineer David Patterson explained in a blog post that the system’s circuit switching feature enables easy routing around failed components. They also mentioned that the flexibility of the system allows for changes to the ai supercomputer interconnect topology to improve the performance of a machine learning model.

Although Google has recently released information about its supercomputer, the system has been operational within the company since 2020 at a data center located in Mayes County, Oklahoma. According to Google, Midjourney, a startup, utilized the system to train its model, which creates new images based on a few words of text.

Advertisement

Google stated in their paper that their chips are up to 1.7 times faster and 1.9 times more power-efficient than Nvidia’s A100 chip, for comparably sized systems that were on the market at the same time as their fourth-generation TPU. However, they did not compare their fourth generation to Nvidia’s current flagship H100 chip because the H100 was released after Google’s chip and is made with newer technology.

While Google hinted at the possibility of developing a new TPU to compete with the Nvidia H100, they did not provide any specific details. Jouppi mentioned to Reuters that Google has “a healthy pipeline of future chips.”

Expert
Expert

Expert in the AI field. He is the founder of aidigitalx. He loves AI.