aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Public Cloud

PyTorch/XLA 2.5: vLLM support and an improved developer experience

  • aster.cloud
  • October 31, 2024
  • 3 minute read

Machine learning engineers are bullish on PyTorch/XLA, a Python package that uses the XLA deep learning compiler to connect the PyTorch deep learning framework and Cloud TPUs. And now, PyTorch/XLA 2.5 is here, along with a set of improvements to add support for vLLM and enhance the overall developer experience. Featured in this release are:

  • A clarified proposal for deprecation of the older torch_xla API in favor of moving towards the existing PyTorch API, providing for a simplified developer experience. An example of this is the migration of existing Distributed API.
  • A series of improvements to the torch_xla.compile function which improve the debugging experience for developers during the development process.
  • Experimental support in vLLM for TPUs, allowing you to extend your existing deployments and while leveraging the same vLLM interface across your TPUs.

Let’s take a look at each of these enhancements.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Streamlining the torch_xla API

With PyTorch/XLA 2.5, we’re taking a significant step towards making the API more consistent with upstream PyTorch. Our north star is to minimize the learning curve for developers already familiar with PyTorch, making it easier to use XLA devices. This means gradually phasing out and deprecating custom API calls for PyTorch/XLA for more mature functionality when possible, and then, migrating the API calls over to their PyTorch counterparts. Other features still remain within the existing Python module before migration.

In the spirit of a simpler developer experience for PyTorch/XLA, in this release we have migrated over to leveraging some existing PyTorch distributed API functions when running models on top of PyTorch/XLA. Historically, the calls for the distributed API were located under the torch_xla module; in this update we migrated most of them to torch.distributed.

# With PyTorch/XLA 2.4
import torch_xla.core.xla_model as xm
xm.all_reduce()

# Supported after PyTorch/XLA 2.5
torch.distrbuted.all_reduce()

Improvement to ‘torch_xla.compile’

Read More  The Impact Of Public Cloud Price Hikes

We’ve also added a few new compilation features to help you debug or notice potential issues within your model code. For example, a ‘full_graph’ mode emits an error message when there’s more than one compilation graph. This helps you discover potential issues caused by multiple compilation graphs early on (during compilation).

Additionally, you can now specify an expected number of recompilations for compiled functions. This can help you debug performance issues in which a function might be getting recompiled more times than expected, for example, when it has unexpected dynamism.

You can now also give compiled functions an understandable name instead of an automatically created one. By naming compiled targets, you gain more context when debugging messages, making it easier to figure out where the problem may be. Here’s an example of what that looks like in reality:

# named code
@torch_xla.compile
def dummy_cos_sin_decored(self, tensor):
return torch.cos(torch.sin(tensor))

# target dumped HLO renamed with named code function name
...
module_0021.SyncTensorsGraph.4.hlo_module_config.txt
module_0021.SyncTensorsGraph.4.target_arguments.txt
module_0021.SyncTensorsGraph.4.tpu_comp_env.txt
module_0024.dummy_cos_sin_decored.5.before_optimizations.txt
module_0024.dummy_cos_sin_decored.5.execution_options.txt
module_0024.dummy_cos_sin_decored.5.flagfile
module_0024.dummy_cos_sin_decored.5.hlo_module_config.txt
module_0024.dummy_cos_sin_decored.5.target_arguments.txt
module_0024.dummy_cos_sin_decored.5.tpu_comp_env.txt
...

Looking at the above output you can see the original versus the named output generated from the same file; ‘SyncTensorsGraph’ is the automatically generated name. Below, you can see the renamed file related to the small code example above.

vLLM on TPU (experimental)

If you use vLLM to serve models on GPUs, you can now switch to TPU as a backend. vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. vLLM on TPU retains the same vLLM interface that developers love, including direct integration into Hugging Face Model Hub to simplify model experimentation on TPU. 

Read More  Build A Chat Server With Cloud Run

Switching your vLLM endpoint to TPU is a matter of a few config changes. Aside from the TPU image, everything else remains the same: request payload, metrics used for autoscaling, load balancing, model source code, etc. For details, see the installation guide. 

Other vLLM features we’ve extended to TPU include Pallas kernels such as paged attention, flash attention and performance optimizations in dynamo bridge, all which are now part of the PyTorch/XLA repository (code). While vLLM is available to PyTorch TPU users, this work is still ongoing, and we look forward to rolling out additional features and optimizations in future releases.

Start using PyTorch/XLA 2.5

You can start taking advantage of these latest features by downloading the latest release through your Python package manager. Or, if this is your first time hearing about PyTorch/XLA, check out the project’s Github page for installation instructions and more detailed information.

For a full list of changes, check out the release notes!

By: Manfei Bai (Software Engineer) and Duncan Campbell (Developer Advocate)
Originally published at: Google Cloud Blog

Source: zedreviews.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Cloud TPU
  • Deep Learning
  • Python
  • PyTorch
You May Also Like
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
DeepSeek R1 is now available on Azure AI Foundry and GitHub
View Post
  • Public Cloud
  • Technology

DeepSeek R1 is now available on Azure AI Foundry and GitHub

  • February 2, 2025
Cloud platforms among the clouds
View Post
  • Computing
  • Learning
  • Public Cloud

Best Cloud Platforms Offering Free Trials for Cloud Mastery

  • December 23, 2024
Vehicle Manufacturing
View Post
  • Hybrid Cloud
  • Public Cloud

Toyota shifts into overdrive: Developing an AI platform for enhanced manufacturing efficiency

  • December 10, 2024
IBM and AWS
View Post
  • Public Cloud

IBM and AWS Accelerate Partnership to Scale Responsible Generative AI

  • December 2, 2024
COP29 AI and Climate Change
View Post
  • Public Cloud
  • Technology

How Cloud And AI Are Bringing Scale To Corporate Climate Mitigation And Adaptation

  • November 18, 2024
Cloud Workstations
View Post
  • Public Cloud

FEDRAMP High Development in the Cloud: Code with Cloud Workstations

  • November 8, 2024
View Post
  • Platforms
  • Public Cloud

Empowering builders with the new AWS Asia Pacific (Malaysia) Region

  • August 30, 2024

Stay Connected!
LATEST
  • 1
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • 2
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • 3
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 4
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 5
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
  • Gemma 3n 6
    Announcing Gemma 3n preview: powerful, efficient, mobile-first AI
    • May 22, 2025
  • 7
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • oracle-ibm 8
    Google Cloud and Philips Collaborate to Drive Consumer Marketing Innovation and Transform Digital Asset Management with AI
    • May 20, 2025
  • 9
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • notta-ai-header 10
    Notta vs Fireflies: Which AI Transcription Tool Deserves Your Attention in 2025?
    • May 16, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • oracle-ibm 2
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • college-of-cardinals-2025 3
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 4
    The World Is Revalidating Itself
    • May 6, 2025
  • 5
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.