Recently, models with billions or trillions of parameters have shown significant advances in machine learning capabilities and accuracy. For example, Google’s LaMDA model is able to engage in a free-flowing conversation with users about a large variety of topics. There is enormous interest within the machine learning research and product communities in leveraging large models to deliver breakthrough capabilities. The high computational demand of these large models requires an increased focus on improving the efficiency of the model training process, and benchmarking is an important means to coalesce the ML systems community towards realizing higher efficiencies.

In the recently concluded MLPerf v1.1 Training round¹, Google submitted two large language model benchmarks into the Open division, one with 480 billion parameters and a second with 200 billion parameters. These submissions make use of publicly available infrastructure, including Cloud TPU v4 Pod slices and the Lingvo open source modeling framework.

Partner with aster.cloud
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

Traditionally, training models at these scales would require building a supercomputer at a cost of tens or even hundreds of millions of dollars – something only a few companies can afford to do. Customers can achieve the same results using exaflop-scale Cloud TPU v4 Pods without incurring the costs of installing and maintaining an on-premise system.

Large model benchmarks

Google’s Open division submissions consist of a 480 billion parameter dense Transformer-based encoder-only benchmark using TensorFlow and a 200 billion-parameter JAX benchmark. These models are architecturally similar to MLPerf’s BERT model but with larger dimensions and number of layers. These submissions demonstrate large model scalability and high performance on TPUs across two distinct frameworks. Notably, these benchmarks, with their stacked transformer architecture, are fairly comparable in terms of their compute characteristics with other large language models.

*Figure 1: Architecture of the Encoder-only model used in Google’s MLPerf 1.1 submissions.*

Our two submissions were benchmarked on 2048-chip and 1024-chip TPU v4 Pod slices, respectively. We were able to achieve an end-to-end training time of ~55 hours for the 480B parameter model and ~40 hours for the 200B parameter model. Each of these runs achieved a computational efficiency of 63%- calculated as a fraction of floating point operations of the model together with compiler rematerialization over the peak FLOPs of the system used².

Next-generation ML infrastructure for large Model training

Achieving these impressive results required a combination of several cutting edge technologies. First, each TPU v4 chip provides more than 2X the compute power of a TPU v3 chip – up to 275 peak TFLOPS. Second, 4,096 TPU v4 chips are networked together into a Cloud TPU v4 Pod by an ultra-fast interconnect that provides 10x the bandwidth per chip at scale compared to typical GPU-based large scale training systems. Large models are very communication intensive: local computation often depends on results from remote computation that are communicated across the network. TPU v4’s ultra-fast interconnect has an outsized impact on computational efficiency of large models by eliminating latency and congestion in the network.

*Figure 2: A portion of one of Google’s Cloud TPU v4 Pods, each of which is capable of delivering in excess of 1 exaflop/s of computing power.*

The performance numbers demonstrated by our submission also rely on our XLA linear algebra compiler and leverage the Lingvo framework. XLA transparently performs a number of optimizations, including GSPMD based automatic parallelization of many of the computation graphs that form the building blocks of the ML model. XLA also allows for reduction in latency by overlapping communication with the computations. Our two submissions demonstrate the versatility and performance of our software stack across two frameworks, TensorFlow and JAX.

Large models in MLPerf

Google’s submissions represent an important class of models that have become increasingly important in ML research and production, but are currently not represented in MLPerf’s Closed division benchmark suite.

We believe that adding these models to the benchmark suite is an important next step and can inspire the ML systems community to focus on addressing the scalability challenges that large models present.

Our submissions demonstrate 63% computational efficiency, cutting edge in the industry. This high computational efficiency enables higher experimentation velocity through faster training. This directly translates into cost savings for Google’s Cloud TPU customers.

Please visit the Cloud TPU homepage and documentation to learn more about leveraging Cloud TPUs using TensorFlow, PyTorch, and JAX.

^{1. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.}^{2. Computational efficiency and end-to-end training time are not official MLPerf metrics}

By: Aarush Selvan (Product Manager) and Pankaj Kanwar (Technical Program Manager, TPU)
Source: Google Cloud Blog

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Google Showcases Cloud TPU V4 Pods For Large Model Training

From our partners:

Large model benchmarks

Next-generation ML infrastructure for large Model training

Large models in MLPerf

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

aster.cloud

Related Topics

How to create an AWS free tier account

How to configure multiple AWS CLI authentication credentials

Formula E accelerates its work with Google Cloud Storage and Google Workspace

What is database as a service (DBaaS)?

The cloud’s role in PQC migration

Hybrid cloud has hit the mainstream – but firms are still confused about costs

Building secure, scalable AI in the cloud with Microsoft Azure

Turns out OpenAI is the customer behind Oracle’s mysterious $30 billion cloud deal

What is an SBOM (software bill of materials)?

Send SMS texts with Amazon’s SNS simple notification service

Most Popular

A looming hyperscaler exodus? UK IT leaders are thinking of ditching US cloud providers – here’s why

AlphaGenome: AI for better understanding the genome

Host a static website on AWS with Amazon S3 and Route 53

The Summer Adventures : Camping Essentials

6 edge monitoring best practices in the cloud

Google Showcases Cloud TPU V4 Pods For Large Model Training

From our partners:

Large model benchmarks

Next-generation ML infrastructure for large Model training

Large models in MLPerf

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Related Topics

You May Also Like