aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Computing

Cloud TPU VMs Are Generally Available

  • aster.cloud
  • May 21, 2022
  • 4 minute read

Earlier last year, Cloud TPU VMs on Google Cloud were introduced to make it easier to use the TPU hardware by providing direct access to TPU host machines. Today, we are excited to announce the general availability (GA) of TPU VMs.

With Cloud TPU VMs you can work interactively on the same hosts where the physical TPU hardware is attached. Our rapidly growing TPU user community has enthusiastically adopted this access mechanism, because it not only makes it possible to have a better debugging experience, but it also enables certain training setups such as Distributed Reinforcement Learning which were not feasible with TPU Node (networks accessed) architecture.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

What’s new for the GA release?

Cloud TPUs are now optimized for large-scale ranking and recommendation workloads. We are also thrilled to share that Snap, an early adopter of this new capability, achieved about ~4.65x perf/TCO improvement to their business-critical ad ranking workload. Here are a few highlights from Snap’s blog post on Training Large Scale Recommendation Models:

> TPUs can offer much faster training speed and significantly lower training costs for recommendation system models than the CPUs;
> TensorFlow for cloud TPU provides a powerful API to handle large embedding tables and fast lookups;
> On TPU v3-32 slice, Snap was able to get a ~3x better throughput (-67.3% throughput on A100) with 52.1% lower cost compared to an equivalent A100 configuration (~4.65x perf/TCO)

Ranking and recommendation

With the TPU VMs GA release, we are introducing the new TPU Embedding API, which can accelerate ML Based ranking and recommendation workloads.

Read More  Run Your Fault-Tolerant Workloads Cost-Effectively With Google Cloud Spot VMs, Now GA

Many businesses today are built around ranking and recommendation use-cases, such as audio/video recommendations, product recommendations (apps, e-commerce), and ad ranking. These businesses rely on ranking and recommendation algorithms to serve their users and drive their business goals. In the last few years, the approaches to these algorithms have evolved from being purely statistical to deep neural network-based. These modern DNN-based algorithms offer greater scalability and accuracy, but they can come at a cost. They tend to use large amounts of data and can be difficult and expensive to train and deploy with traditional ML infrastructure.

Embedding acceleration with Cloud TPU can solve this problem at a lower cost. Embedding APIs can efficiently handle large amounts of data, such as embedding tables, by automatically sharding across hundreds of Cloud TPU chips in a pod, all connected to one another via the custom-built interconnect.

To help you get started, we are releasing the TF2 ranking and recommendation APIs, as part of the Tensorflow Recommenders library. We have also open sourced DLRM and DCN v2 ranking models in the TF2 model garden and the detailed tutorials are available here.

Semi-supervised learning with Cloud TPUs

We recently released a technical whitepaper as a joint project between Google Cloud Customer Engineering teams and one of our Global Insurance customers, MunichRE. It highlights how just the Cloud TPU (V3-32) alone enabled them to reduce model training time by 92%, and achieve ~18.75x perf/TCO (in comparison to the A100 GPU) for semi-supervised image classification/segmentation problems. Since Semi-Supervised Learning(SSL)  methods achieve significant performance improvements from ingesting large amounts of inexpensive unlabelled data, Cloud TPUs are the perfect complement to keep model training time and costs from becoming infeasible which extracting maximum performance using SSL methods.

Read More  NVIDIA Announces New Switches Optimized for Trillion-Parameter GPU Computing and AI Infrastructure

Framework support

TPU VM GA Release supports the three major frameworks (TensorFlow, PyTorch and JAX) now offered through three optimized environments for ease of setup with the respective framework. GA release has been validated with TensorFlow v2-tf-stable, PyTorch/XLA v1.11 and JAX [0.3.6].

TPU VMs Specific Features

TPU VMs offer several additional capabilities over TPU Node architecture thanks to the local execution setup, i.e. TPU hardware connected to the same host that users execute the training workload(s).

Local execution of input pipeline 

Input data pipeline executes directly on the TPU hosts. This functionality allows saving precious computing resources earlier used in the form of instance groups for PyTorch/JAX distributed training. In the case of Tensorflow, the distributed training setup required only one user VM and data pipeline executed directly on TPU hosts.

The following study summarizes the cost comparison for Transformer (FairSeq; PyTorch/XLA) training executed for 10 epochs on TPU VM vs TPU Node architecture (Network attached Cloud TPUs):

Google Internal data (published benchmark conducted on Cloud TPU by Google).

 

Distributed Reinforcement Learning with TPU VMs

Local execution on the host with the accelerator, also enables use cases such as Distributed Reinforcement Learning. Canonical works in this domain such as seed-RL, IMPALA and Podracer have been developed using Cloud TPUs.


“…, we argue that the compute requirements of large scale reinforcement learning systems are particularly well suited for making use of Cloud TPUs , and specifically TPU Pods: special configurations in a Google data center that feature multiple TPU devices interconnected by low latency communication channels. “—Podracer, DeepMind


Custom Ops Support for TensorFlow

Read More  Google Cloud-Nokia Venture Highlights Battle For 5G Network Edge

With direct execution on TPU VM, users can now build their own custom ops such as TensorFlow Text. With this feature, the users are no longer bound to TensorFlow runtime release versions.

What are our customers saying?

“Over the last couple of years, Kakao Brain has developed numerous groundbreaking AI services and models, including minDALL-E, KoGPT and, most recently, RQ-Transformer. We’ve been using TPU VM architecture since its early launch on Google Cloud, and have experienced significant performance improvements compared to the original TPU node set up. We are very excited about the new features added in the Generally Available version of TPU VM, such as Embeddings API, and plan to continue using TPUs to solve some of the globe’s biggest ‘unthinkable questions’ with solutions enabled by its lifestyle-transforming AI technologies”—Kim Il-doo, CEO of Kakao Brain

Additional Customers’ testimonials are available here.

How to get started?

To start using TPU VM, you can follow one of our quick starts or tutorials. If you are new to TPUs you can explore our concepts deep-dives and system architecture. We strive to make Cloud TPUs – Google’s advanced AI infrastructure – universally useful and accessible.

 

 

By: Vaibhav Singh (Outbound Product Manager, Cloud TPU) and Max Sapoznikov (Product Manager, Cloud TPU)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Cloud TPU
  • Compute
  • Google Cloud
  • Virtual Machines
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
Microsoft’s Majorana 1 chip carves new path for quantum computing
View Post
  • Computing
  • Technology

Microsoft’s Majorana 1 chip carves new path for quantum computing

  • February 19, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
CES 2025: Intel Shows Off Its AI Tech
View Post
  • Computing
  • Technology

CES 2025: Intel Shows Off Its AI Tech

  • January 23, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
Cloud platforms among the clouds
View Post
  • Computing
  • Learning
  • Public Cloud

Best Cloud Platforms Offering Free Trials for Cloud Mastery

  • December 23, 2024
View Post
  • Computing

Azure Cobalt 100-based Virtual Machines are now generally available

  • October 22, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • oracle-ibm 3
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • 4
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 5
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 6
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 7
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 8
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 9
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 10
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
  • 2
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 3
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 4
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 5
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.