aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • aster.cloud
  • March 9, 2025
  • 5 minute read

AI Hypercomputer is a fully integrated supercomputing architecture for AI workloads – and it’s easier to use than you think. In this blog, we break down four common use cases, including reference architectures and tutorials, representing just a few of the many ways you can use AI Hypercomputer today. 

Short on time? Here’s a quick summary.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

  • Affordable inference. JAX, Google Kubernetes Engine (GKE) and NVIDIA Triton Inference Server are a winning combination, especially when you pair them with Spot VMs for up to 90% cost savings. We have several tutorials, like this one on how to serve LLMs like Llama 3.1 405B on GKE.
  • Large and ultra-low latency training clusters. Hypercompute Cluster gives you physically co-located accelerators, targeted workload placement, advanced maintenance controls to minimize workload disruption, and topology-aware scheduling. You can get started by creating a cluster with GKE or try this pretraining NVIDIA GPU recipe.
  • High-reliability inference. Pair new cloud load balancing capabilities like custom metrics and service extensions with GKE Autopilot, which includes features like node auto-repair to automatically replace unhealthy nodes, and horizontal pod autoscaling to adjust resources based on application demand. 
  • Easy cluster setup. The open-source Cluster Toolkit offers pre-built blueprints and modules for rapid, repeatable cluster deployments. You can get started with one of our AI/ML blueprints.
  • If you want to see a broader set of reference implementations, benchmarks and recipes, go to the AI Hypercomputer GitHub.

Why it matters
Deploying and managing AI applications is tough. You need to choose the right infrastructure, control costs, and reduce delivery bottlenecks. AI Hypercomputer helps you deploy AI applications quickly, easily, and with more efficiency relative to just buying the raw hardware and chips. 

Take Moloco, for example. Using the AI Hypercomputer architecture they achieved 10x faster model training times and reduced costs by 2-4x.

Let’s dive deeper into each use case.

Read More  Upskill For In-Demand Cloud Roles With No-Cost Training On Coursera

1. Reliable AI inference

According to Futurum, in 2023 Google had ~3x fewer outage hours vs. Azure, and ~3x fewer than AWS. Those numbers fluctuate over time, but maintaining high availability is a challenge for everyone. The AI Hypercomputer architecture offers fully integrated capabilities for high-reliability inference.

Many customers start with GKE Autopilot because of its 99.95% pod-level uptime SLA. Autopilot enhances reliability by automatically managing nodes (provisioning, scaling, upgrades, repairs) and applying security best practices, freeing you from manual infrastructure tasks. This automation, combined with resource optimization and integrated monitoring, minimizes downtime and helps your applications run smoothly and securely.

There are several configurations available, but in this reference architecture we use TPUs with the JetStream Engine to accelerate inference, plus JAX, GCS Fuse, and SSDs (like Hyperdisk ML) to speed up the loading of model weights.  As you can see, there are two notable additions to the stack that get us to high reliability: Service Extensions and custom metrics.

Custom metrics, utilizing the Open Request Cost Aggregation (ORCA) protocol, allow applications to send workload-specific performance data (like model serving latency) to Cloud Load Balancer, which then uses this information to make intelligent routing and scaling decisions.

Service extensions allow you to customize the behavior of Cloud Load Balancer by inserting your own code (written as plugins) into the data path, enabling advanced traffic management and manipulation.

Try it yourself. Start by defining your Load Balancing Metrics, create a plugin using Service Extensions, or spin up a fully-managed Kubernetes cluster with Autopilot. For more ideas, check out this blog on the latest networking enhancements for generative AI applications

Read More  Golang’s GORM Support For Cloud Spanner Is Now Generally Available

2. Large scale AI training

Training large AI models demands massive, efficiently scaled compute. Hypercompute Cluster is a supercomputing solution built on AI Hypercomputer that lets you deploy and manage a large number of accelerators as a single unit, using a single API call. Here are a few things that set Hypercompute Cluster apart:

  • Clusters are densely physically co-located for ultra-low-latency networking. They come with pre-configured and validated templates for reliable and repeatable deployments, and with cluster-level observability, health monitoring, and diagnostic tooling.
  • To simplify management, Hypercompute Clusters are designed for integrating with orchestrators like GKE and Slurm, and are deployed via the Cluster Toolkit. GKE provides support for over 50,000 TPU chips to train a single ML model. 

In this reference architecture, we use GKE Autopilot and A3 Ultra VMs.

A3 Ultra uses NVIDIA H200 GPUs with twice the GPU-to-GPU network bandwidth and twice the high bandwidth memory (HBM) compared to A3 Mega GPUs. They are built with our new Titanium ML network adapter and incorporate NVIDIA ConnectX-7 network interface cards (NICs) to deliver a secure, high-performance cloud experience, perfect for large multi-node workloads on GPUs.

GKE supports up to 65,000 nodes — we believe this is more than 10X larger scale than the other two largest public cloud providers.

Try it yourself: Create a Hypercompute Cluster with GKE or try this pretraining NVIDIA GPU recipe.

3. Affordable AI inference

Serving AI, especially large language models (LLMs), can become prohibitively expensive. AI Hypercomputer combines open software, flexible consumption models and a wide range of specialized hardware to minimize costs.

  • Cost savings are everywhere, if you know where to look. Beyond the tutorials, there are two cost-efficient deployment models you should know. GKE Autopilot reduces the cost of running containers by up to 40% compared to standard GKE by automatically scaling resources based on actual needs, while Spot VMs can save up to 90% on batch or fault-tolerant jobs. You can combine the two to save even more — “Spot Pods” are available in GKE Autopilot to do just that.
Read More  The Changing World Of Java

In this reference architecture, after training with JAX, we convert into NVIDIA’s Faster Transformer format for inferencing. Optimized models are served via NVIDIA’s Triton on GKE Autopilot. Triton’s multi-model support allows for easy adaptation to evolving model architectures, and a pre-built NeMo container simplifies setup.

Try it yourself: Start by learning how to serve a model with a single NVIDIA GPU in GKE. You can also serve Gemma open models with Hugging Face TGI, or LLMs like DeepSeek-R1 671B and Llama 3.1 405B.

4. Easy cluster setup and deployment

You need tools that simplify, not complicate, your infrastructure setup. The open-source Cluster Toolkit offers pre-built blueprints and modules for rapid, repeatable cluster deployments. You get easy integration with JAX, PyTorch, and Keras. Platform teams get simplified management with Slurm, GKE, and Google Batch, plus flexible consumption models like Dynamic Workload Scheduler and a wide range of hardware options. In this reference architecture, we set up an A3 Ultra cluster with Slurm:

Try it yourself. You can select one of our easy-to-use AI/ML blueprints, available through our GitHub repo, and use it to set up a cluster. We also offer a variety of resources to help you get started, including documentation, quickstarts, and videos.

By: Duncan Campbell (Developer Advocate, Google Cloud) and Jarrad Swain (Product Marketing, Google Cloud)
Originally published at: Google Cloud Blog

Source: zedreviews.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • AI
  • Google Cloud
  • Hypercomputer
You May Also Like
View Post
  • Computing
  • Multi-Cloud
  • Technology

What is confidential computing?

  • June 17, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Oracle adds xAI Grok models to OCI

  • June 17, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Fine-tune your storage-as-a-service approach

  • June 16, 2025
View Post
  • Technology

Advanced audio dialog and generation with Gemini 2.5

  • June 15, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Global cloud spending might be booming, but AWS is trailing Microsoft and Google

  • June 13, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Google Cloud, Cloudflare struck by widespread outages

  • June 12, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

What is PC as a service (PCaaS)?

  • June 12, 2025
View Post
  • Technology

Apple services deliver powerful features and intelligent updates to users this autumn

  • June 11, 2025

Stay Connected!
LATEST
  • What is confidential computing?
    • June 17, 2025
  • Oracle adds xAI Grok models to OCI
    • June 17, 2025
  • Fine-tune your storage-as-a-service approach
    • June 16, 2025
  • 4
    Advanced audio dialog and generation with Gemini 2.5
    • June 15, 2025
  • 5
    A Father’s Day Gift for Every Pop and Papa
    • June 13, 2025
  • 6
    Global cloud spending might be booming, but AWS is trailing Microsoft and Google
    • June 13, 2025
  • Google Cloud, Cloudflare struck by widespread outages
    • June 12, 2025
  • What is PC as a service (PCaaS)?
    • June 12, 2025
  • 9
    Apple services deliver powerful features and intelligent updates to users this autumn
    • June 11, 2025
  • By the numbers: Use AI to fill the IT skills gap
    • June 11, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Crayon targets mid-market gains with expanded Google Cloud partnership
    • June 10, 2025
  • 2
    Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design
    • June 9, 2025
  • Apple-WWDC25-Apple-Intelligence-hero-250609 3
    Apple Intelligence gets even more powerful with new capabilities across Apple devices
    • June 9, 2025
  • Apple-WWDC25-Liquid-Glass-hero-250609_big.jpg.large_2x 4
    Apple introduces a delightful and elegant new software design
    • June 9, 2025
  • Robot giving light bulb to businessman. Man sitting with laptop on money coins flat vector illustration. Finance, help of artificial intelligence concept for banner, website design or landing web page 5
    FinOps X 2025: IT cost management evolves for AI, cloud
    • June 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.