aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering
  • Research

Architecting The Future Of Supercomputing

  • Dean Marc
  • August 23, 2023
  • 4 minute read

​As chief architect and principal investigator for the Aurora supercomputer at Argonne National Laboratory in Illinois, Olivier Franza plays a leading role in bringing one of the most ambitious scientific instruments – not to mention the world’s largest GPU cluster – into existence.

Aurora is among the most anticipated and highly visible projects Intel has been a part of in recent memory – a bold bet on Intel’s entire system portfolio. The machine is expected to be the first supercomputer with a peak performance reaching 2 exaflops, or 2×1018, floating point operations per second.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

That puts a bit of pressure on Franza, a 22-year Intel veteran who joined the Aurora project as system hardware architect in 2016, oversaw the pivot to a GPU-based machine and became chief architect in 2021.

“The chief architect is responsible for defining the overall system architecture of the supercomputer, according to the customer’s high-level requirements,” Franza explains. “There are fundamental ones like general performance metrics and power envelope, but also inherent features like RAS – reliability, availability, serviceability – that are essential to building a scalable system.”

His responsibilities also encompass the details of the system topology from a node to a rack to the complete system, including its networking fabric and storage components.

A Roadmap Pivot Opens Opportunity to Shape Future Products

When initial planning began for Aurora, a U.S. Department of Energy-sponsored system, the design consisted of a collection of Intel technologies. However, changes to Intel’s product roadmap, notably the end of the Xeon Phi and Omnipath product families, required a restart. As Intel made plans to build data center GPUs, Franza became enmeshed in discussions on the design of the Intel® Data Center GPU Max Series (code-named Ponte Vecchio).

Read More  What’s New In Azure Data & AI: Azure Is Built For Generative AI Apps

In this way, Aurora isn’t just a one-off system. Rather, it helped inform the Intel-wide strategy and product portfolio to address scale and performance at the highest level.

“We infused all the Aurora system-level requirements down to the components’ level,” Franza says.

The architecture and concept for the Intel® Xeon® CPU Max Series with high bandwidth memory, for instance, was spawned by some features from the Intel Xeon Phi platform, the first product to integrate an innovative memory architecture for high bandwidth and high capacity on package.

Additionally, the need for high performance drove further advances across all subsystems, from the compute blade’s thermo-mechanical solution to its dense physical integration, to storage.

“Intel ended up architecting a completely new storage concept, DAOS (distributed asynchronous object storage),” Franza says. It’s an open source software ecosystem to enable high-speed storage on traditional hardware. “Aurora will be among the first systems to use it, and by far the largest.”

From Designing Components to Bolting Together Thousands of Systems

The Aurora project drove system-level thinking and broad collaboration across various business units inside Intel, as well as with Argonne scientists and engineers at Hewlett Packard Enterprise, the project’s other main partner.

“Getting the whole team to align and deliver a machine like Aurora is, for many of us, a once-in-a-lifetime experience,” Franza says.

Although engineers installed the final blade in June, the project continues to keep Franza up at night as the system passes through the stages of testing, stabilization and validation at scale.

He provides guidance to a large team working on system bring-up, validation, stabilization, optimization and enablement of full-system performance workloads. Most notable is the High Performance Linpack (HPL) benchmark that determines the top systems in the world, as certified by the bi-annual Top500 list.

Read More  What Is Infrastructure From Code?

Each morning, Franza joins the daily standup meeting to scrutinize nightly runs on every single node and makes a game plan for the next day’s work and beyond. Each afternoon, a daily closeout meeting summarizes progress and hurdles. The work never stops; the machine always runs.

“We have a step-by-step approach to methodically validate and stabilize at scale,” he explains. “You start with the blade, then move to the rack, then multiple racks, and you scale from there.”

Aurora is made up of 10,624 compute blades, boasting 63,744 Intel Max Series GPUs – more GPUs than any other system in the world – and 21,248 Intel Xeon Max CPUs across 166 racks.

“It’s the size of four tennis courts, which sounds like a lot, right?” he says. “But it’s only when you actually go see it that you just realize the sheer magnitude of the project.”

Franza must ensure the vast system is stable, functional and performing. It’s a daunting task, but the end is within reach.

“Walking through the aisles, with all the lights on, and feeling that the machine is running is impressive and obviously extremely rewarding,” he says. “It’s a very tangible achievement that speaks for itself.”

A ‘Once-in-a-Lifetime’ Effort, a Science-Shaping Supercomputer

What keeps him going, through engineering hurdles and unexpected roadblocks, is the opportunity to build “an extraordinary machine” that will power impactful research. He cites Aurora’s enormous potential for cancer research as an area where the project will benefit us all.

“I think that’s something that is going to make us very proud,” he says.

Read More  Build A Chat Server With Cloud Run

Not only will Aurora work on solving some of the most complex scientific and engineering problems in the world, it will also be an ideal platform for running generative AI and applying it to research. “It will enable one of the biggest large language models planned to date, the 1 trillion parameter Aurora GenAI project, enhancing, enabling and easing the lives of scientists,” Franza says.

But it’s the teamwork and camaraderie he enjoys more than anything else.

“It’s an extended effort, and it requires a lot of perseverance,” he says. “The core team has maintained a marathon mentality where it’s not over until it’s over. We needed the kind of people that can effectively focus for a long time on something immensely challenging. And in the end, the accomplishment is something that very few can say they have achieved.”

Source: cyberpogo.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Dean Marc

Part of the more nomadic tribe of humanity, Dean believes a boat anchored ashore, while safe, is a tragedy, as this denies the boat its purpose. Dean normally works as a strategist, advisor, operator, mentor, coder, and janitor for several technology companies, open-source communities, and startups. Otherwise, he's on a hunt for some good bean or leaf to enjoy a good read on some newly (re)discovered city or walking roads less taken with his little one.

Related Topics
  • AI
  • Argonne National Laboratory
  • Artificial Intelligence
  • Aurora supercomputer
  • Data Center
  • Generative AI
  • GPU
  • High Performance Computing
  • Intel
  • Olivier Franza
  • Supercomputer
  • Supercomputing
  • U.S. Department of Energy
You May Also Like
View Post
  • Engineering
  • Technology

Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design

  • June 9, 2025
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024

Stay Connected!
LATEST
  • 1
    Pure Accelerate 2025: All the news and updates live from Las Vegas
    • June 18, 2025
  • 2
    ‘This was a very purposeful strategy’: Pure Storage unveils Enterprise Data Cloud in bid to unify data storage, management
    • June 18, 2025
  • What is cloud bursting?
    • June 18, 2025
  • 4
    There’s a ‘cloud reset’ underway, and VMware Cloud Foundation 9.0 is a chance for Broadcom to pounce on it
    • June 17, 2025
  • What is confidential computing?
    • June 17, 2025
  • Oracle adds xAI Grok models to OCI
    • June 17, 2025
  • Fine-tune your storage-as-a-service approach
    • June 16, 2025
  • 8
    Advanced audio dialog and generation with Gemini 2.5
    • June 15, 2025
  • 9
    A Father’s Day Gift for Every Pop and Papa
    • June 13, 2025
  • 10
    Global cloud spending might be booming, but AWS is trailing Microsoft and Google
    • June 13, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Google Cloud, Cloudflare struck by widespread outages
    • June 12, 2025
  • What is PC as a service (PCaaS)?
    • June 12, 2025
  • 3
    Crayon targets mid-market gains with expanded Google Cloud partnership
    • June 10, 2025
  • By the numbers: Use AI to fill the IT skills gap
    • June 11, 2025
  • 5
    Apple services deliver powerful features and intelligent updates to users this autumn
    • June 11, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.