aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Public Cloud

Clouds In The Cloud: Weather forecasting In Google Cloud

  • aster.cloud
  • March 31, 2022
  • 6 minute read

Weather forecasting and climate modeling are two of the world’s most computationally complex and demanding tasks. Further, they’re extremely time-sensitive and in high demand — everyone from weekend travelers to large-scale industrial farming operators wants up-to-date weather predictions. To provide timely and meaningful predictions, weather forecasters usually rely on high performance computing (HPC) clusters hosted in an on-premises data center. These on-prem HPC systems require significant capital investment and have high long-term operational costs. They consume a lot of electricity, have largely fixed configurations, and the underlying computer hardware is replaced infrequently.

Using the cloud instead offers increased flexibility, constantly refreshed hardware, high reliability, geo-distributed compute and networking, and a “pay for what you use” pricing model. Ultimately, cloud computing allows forecasters and climate modelers to provide timely and accurate results on a flexible platform using the latest hardware and software systems, in a cost effective manner. This is a big shift compared with traditional approaches to weather forecasting, and can appear challenging. To help, weather forecasters can now run the Weather Research and Forecasting (WRF) modeling system easily on Google Cloud using the new WRF VM image from Fluid Numerics, and achieve the performance of an on-premises supercomputer for a fraction of the price. With this solution, weather forecasters can get a WRF simulation up and running on Google Cloud in less than an hour!


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

A closer look at WRF

Weather Research and Forecasting (WRF) is a popular open-source numerical weather prediction modeling system used by both researchers and operational organizations. While WRF is primarily used for weather and climate simulation, teams have extended it to support interactions with chemistry, forest fire modeling, and other use cases. WRF development began in the late 1990s through a collaboration between the National Center for Atmospheric Research (NCAR), National Oceanic and Atmospheric Administration (NOAA), U.S. Air Force, Naval Research Laboratory, University of Oklahoma, and the Federal Aviation Administration. The WRF community comprises more than 48,000 users spanning over 160 countries, with the shared goal of supporting atmospheric research and operational forecasting.

The Google Cloud WRF image is built using Google’s MPI best practices for HPC, with the exception that hyperthreading is not disabled by default, and is easily integrated with other HPC solutions on Google Cloud, including SchedMD’s Slurm-GCP. Normally, installing WRF and its dependencies is a time consuming process. With these new WRF VM images, deploying a scalable HPC cluster with WRF v4.2 pre-installed is quick and easy with our Codelab. OpenMPI 4.0.2 was used throughout this work. Google has had good success with Intel MPI, and we intend to study whether further performance gains can be achieved in this context.

Read More  Better Kubernetes Application Monitoring With GKE Workload Metrics

Optimizing WRF

Determining the optimal architecture and build settings for performance and cost was a key part of the process in developing the WRF images. We evaluated how to select the ideal compiler, right CPU platform, and the best file system for handling file IO, so you don’t have to. As a test case for assessing performance, we used the CONUS 2.5km benchmark.

Below, the CONUS 2.5km runtime and cost figure shows the run time required for simulating WRF over a two-hour forecast using 480 MPI ranks (a way of numbering processes) for different machine types available on Google Cloud. For each machine type, we’re showing the lowest measured run time from a suite of tests that varied compiler, compiler optimizations, and task affinity.

1 Weather forecasting.jpg

We found that compute-optimized c2 instances provided the shortest run time. The Slurm job scheduler allows you to map the MPI tasks to compute hardware using task affinity flags. When optimizing the runtime and cost for each machine type, we compared using srun –map-by core –bind-to core to launch WRF, which maps each MPI process to a physical core (two vCPU per MPI rank), and srun –map-by thread –bind-to thread, which maps each MPI process to a single vCPU. Mapping by core and binding MPI ranks to cores is akin to disabling hyperthreading.

2 Weather forecasting.jpg

The ideal simulation cost and runtime for CONUS 2.5km for each platform is found when each MPI rank is subscribed to each vCPU. When binding to vCPUs, half as many compute resources are needed when compared to binding to physical cores lowering the per-second cost for the simulation. For CONUS 2.5km, we also found that although mapping MPI ranks to cores results in reduced runtime for the same number of MPI ranks, the performance gains are not significant enough to outweigh the cost savings. For this reason, the WRF-GCP solution does not disable hyperthreading by default.

Runtime and simulation cost can be further reduced by selecting an ideal compiler: the figure below (CONUS 2.5km Compiler Comparisons) shows the simulation runtime for the WRF CONUS 2.5km benchmark on eight c2-standard-60 instances, using GCC 10.30, GCC 11.2.0 and the Intel® OneAPI® compilers (v2021.2.0). In all cases, WRF is built using level 3 compiler optimizations and Cascade Lake target architecture flags. By compiling WRF with the Intel® OneAPI® compilers, the WRF simulation runs about 47% faster than the GCC builds, and at about 68% of the cost, on the same hardware. We’ve used OpenMPI 4.0.2 with each of the compilers as the MPI implementation in this work. With other applications, Google has seen good performance with Intel MPI 2018, and we intend to investigate performance comparisons with this and other MPI implementations.

Read More  Three Reasons To Peer SAP-Managed Apps With Your Google Cloud Services
3 Weather forecasting.jpg

File IO in WRF can become a significant bottleneck as the number of MPI ranks increases. Obtaining the optimal file IO performance requires using parallel file IO in WRF and leveraging a parallel file system such as Lustre.

Below, we show the speedup in file IO activities relative to serial IO on an NFS file system. For this example, we are running the CONUS 2.5km benchmark on c2-standard-60 instances with 960 MPI ranks. By changing WRF’s file IO strategy to parallel IO, we accelerate file IO time by a factor of 60.

We further speed up IO and reduce simulation costs by using a Lustre parallel file system deployed from open-source Lustre Terraform infrastructure-as-code from Fluid Numerics. Lustre is also available with support from DDN’s EXAScaler solution in the Google Cloud Marketplace. In this case, we use four n2-standard-16 instances for the Lustre Object Storage Server (OSS) instances, each with 3TB of Local SSD. The Lustre Metadata Server (MDS) is an n2-standard-16 instance with a 1TB PD-SSD disk. After mounting the Lustre file system to the cluster, we set the Lustre stripe count to 4 so that file IO can be distributed across the four OSS instances. By switching to the Lustre file system for IO, we speed up file IO by an additional factor of 193, which is orders of magnitude faster than a single NFS server with serial IO.

4 Weather forecasting.jpg

Adding compute resources and increasing the number of MPI ranks reduces the simulation run time. Ideally, with perfect linear scaling, doubling the number of MPI ranks would cut the simulation time in half. However, adding MPI ranks also increases communication overhead, which can increase the cost per simulation. The communication overhead is due to the increased amount of communication necessitated by splitting the problem more finely across more machines.

Read More  How A Top Gaming Company Transformed Its Approach To Security With Splunk And Google Cloud

To assess the scalability of WRF for the CONUS 2.5km benchmark, we can execute a series of model forecasts where we successively double the number of MPI ranks. Below, we show two- hour forecasts on the c2-standard-60 instances with the Lustre file system, varying the number of MPI ranks from 480 to 1920. In all of these runs, MPI ranks are bound to vCPUs so that the number of vCPUs dedicated to each simulation increases with the increase in MPI ranks. While many HPC workloads run best with simultaneous multithreading (SMT) disabled, we find the best performance for CONUS 2.5km with SMT enabled. Thus, the number of MPI ranks in our runs equals the total number of vCPUs.

5 Weather forecasting.jpg

As you can see, the CONUS 2.5km Runtime & Cost Scaling figure shows that the run time (blue bars) decreases as the number of MPI ranks and the amount of compute resources increases, at least up to 1920 ranks. When transitioning from 480 to 960 MPI ranks, the run time drops, yielding a speedup of about 1.8x. Doubling again to 1920 MPI ranks, though, we obtained an additional speedup of just 1.5x. This declining trend in the speedup with increasing MPI ranks is a signature of MPI overhead, which increases with more MPI ranks.

Determining your best fit

Most tightly-coupled MPI applications such as WRF exhibit this kind of scaling behavior, where scaling efficiency decreases with increasing MPI ranks. This makes assessing cost-scaling alongside performance-scaling critical when considering Total Cost of Ownership (TCO). Thankfully, per-second billing on Google Cloud makes this kind of analysis a little bit easier. As shown above, a second doubling of the count from 960 cores to 1920 cores can provide an additional 1.5x speedup, but at a 32% higher cost. In some circumstances, this faster turnaround may be needed and worth the extra cost.

If you want to get started with WRF quickly and experiment with the CONUS 2.5km benchmark, we’ve encapsulated this deployment in Terraform scripts and prepared an accompanying codelab.

You can learn more about Google Cloud’s high performance computing offerings at https://cloud.google.com/hpc, and you can find out more about Google’s partner Fluid Numerics at https://www.fluidnumerics.com.

 

By Wyatt Gorman, HPC Solutions Manager, Google Cloud | Joe Schoonover, Founder & Director of Operations, Fluid Numerics
Source Google Cloud


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Google Cloud
  • Weather Research and Forecasting
  • WRF
You May Also Like
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Data
  • Platforms
  • Public Cloud

PayPal’s historically large data migration is the foundation for its gen AI innovation

  • March 4, 2026
Google Cloud and ElevenLabs
View Post
  • Public Cloud
  • Technology

ElevenLabs Partners with Google Cloud for Cloud Services and the Latest NVIDIA Blackwell GPUs

  • February 26, 2026
View Post
  • Public Cloud

Delivering a secure, open, and sovereign digital world

  • February 12, 2026
View Post
  • Public Cloud

Formula E and Google Cloud Announce Multi-Year ‘Principal Partnership’

  • January 26, 2026
View Post
  • Public Cloud

Sawasdee Thailand! Google Cloud launches new region in Bangkok

  • January 23, 2026
View Post
  • Public Cloud

Retailers Help Mitigate Risk with Oracle’s AI-Driven Supply Chain Collaboration

  • January 11, 2026
View Post
  • Public Cloud

Cognizant acquires 3Cloud to boost global Azure expertise

  • January 8, 2026

Stay Connected!
LATEST
  • Data center 1
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 2
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
  • 3
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
  • pope-leo-xiv-cq5dam-1500.844 4
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 5
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 6
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 7
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • reMarkable Paper Pure 8
    The Quiet Revolution You Did Not Know You Needed
    • May 9, 2026
  • spain-qNO3XMQILTA-unsplash 9
    When the World Feels Unstable, Spain Remains the Calm. Here’s How to Get There Safely.
    • May 2, 2026
  • 10
    Why The CLOUD Act And Geopolitics Are Forcing A Data Sovereignty Reckoning In Europe
    • May 2, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Anthropic Institute 1
    Introducing The Anthropic Institute
    • March 11, 2026
  • Red Hat OpenShift 2
    Red Hat Further Drives Digital Sovereignty for the AI Era with Red Hat OpenShift on Google Cloud Dedicated
    • April 21, 2026
  • Illustration of data storage 3
    The Splinternet Comes for European Supply Chains Why Fragmentation Is Now a Boardroom Problem
    • April 20, 2026
  • 4
    “A lot of other cloud vendors have been let off the hook”: Oracle leans hard on one-size-fits-all appeal of OCI for enterprises
    • March 30, 2026
  • 5
    Why channel partners must design for tech sovereignty
    • April 7, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.