aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • aster.cloud
  • February 19, 2025
  • 7 minute read

AI computing has a reputation for consuming epic quantities of energy. This is partly because of the sheer volume of data being handled. Training often requires billions or trillions of pieces of information to create a model with billions of parameters. But that’s not the whole reason — it also comes down to how most computer chips are built.

Modern computer processors are quite efficient at performing the discrete computations they’re usually tasked with. Though their efficiency nosedives when they must wait for data to move back and forth between memory and compute, they’re designed to quickly switch over to work on some unrelated task. But for AI computing, almost all the tasks are interrelated, so there often isn’t much other work that can be done when the processor gets stuck waiting, said IBM Research scientist Geoffrey Burr.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

In that scenario, processors hit what is called the von Neumann bottleneck, the lag that happens when data moves slower than computation. It’s the result of von Neumann architecture, found in almost every processor over the last six decades, wherein a processor’s memory and computing units are separate, connected by a bus. This setup has advantages, including flexibility, adaptability to varying workloads, and the ability to easily scale systems and upgrade components. That makes this architecture great for conventional computing, and it won’t be going away any time soon.

But for AI computing, whose operations are simple, numerous, and highly predictable, a conventional processor ends up working below its full capacity while it waits for model weights to be shuttled back and forth from memory. Scientists and engineers at IBM Research are working on new processors, like the AIU family, which use various strategies to break down the von Neumann bottleneck and supercharge AI computing.

Read More  Twitter: Helping Customers Find Meaningful Spaces With AutoML

Why does the von Neumann bottleneck exist?

The von Neumann bottleneck is named for mathematician and physicist John von Neumann, who first circulated a draft of his idea for a stored-program computer in 1945. In that paper, he described a computer with a processing unit, a control unit, memory that stored data and instructions, external storage, and input/output mechanisms. His description didn’t name any specific hardware — likely to avoid security clearance issues with the US Army, for whom he was consulting. Almost no scientific discovery is made by one individual, though, and von Neumann architecture is no exception. Von Neumann’s work was based on the work of J. Presper Eckert and John Mauchly, who invented the Electronic Numerical Integrator and Computer (ENIAC), the world’s first digital computer. In the time since that paper was written, von Neumann architecture has become the norm.

“The von Neumann architecture is quite flexible, that’s the main benefit,” said IBM Research scientist Manuel Le Gallo-Bourdeau. “That’s why it was first adopted, and that’s why it’s still the prominent architecture today.”

Discrete memory and computing units mean you can design them separately and configure them more or less any way you want. Historically, this has made it easier to design computing systems because the best components can be selected and paired, based on the application.

Even the cache memory, which is integrated into a single chip with the processor, can still be individually upgraded. “I’m sure there are implications for the processor when you make a new cache memory design, but it’s not as difficult as if they were coupled together,” Le Gallo-Bourdeau said. “They’re still separate. It allows some freedom in designing the cache separately from the processor.”

Read More  Announcing A White Paper On Platforms For Cloud Native Computing

How the von Neumann bottleneck reduces efficiency

For AI computing, the von Neumann bottleneck creates a twofold efficiency problem: the number of model parameters (or weights) to move, and how far they need to move. More model weights mean larger storage, which usually means more distant storage, said IBM Research scientist Hsinyu (Sidney) Tsai. “Because the quantity of model weights is very large, you can’t afford to hold them for very long, so you need to keep discarding and reloading,” she said.

The main energy expenditure during AI runtime is spent on data transfers — bringing model weights back and forth from memory to compute. By comparison, the energy spent doing computations is low. In deep learning models, for example, the operations are almost all relatively simple matrix vector multiplication problems. Compute energy is still around 10% of modern AI workloads, so it isn’t negligible, said Tsai. “It is just found to be no longer dominating energy consumption and latency, unlike in conventional workloads,” she added.

About a decade ago, the von Neumann bottleneck wasn’t a significant issue because processors and memory weren’t so efficient, at least compared to the energy that was spent to transfer data, said Le Gallo-Bourdeau. But data transfer efficiency hasn’t improved as much as processing and memory have over the years, so now processors can complete their computations much more quickly, leaving them sitting idle while data moves across the von Neumann bottleneck.

The farther away the memory is from the processor, the more energy it costs to move it. On a basic physical level, an electrical copper wire is charged to propagate a 1, and it’s discharged to propagate a 0. The energy spent charging and discharging the wires is proportional to their length, so the longer the wire is, the more energy you spend. This also means greater latency, as it takes more time for the charge to dissipate or propagate the longer the wire is.

Read More  Artificial Intelligence For A Smarter Kind Of Cyber Security

Admittedly, the time and energy cost of each data transfer is low, but every time you want to propagate data through a large language model, you need to load up to billions of weights from the memory. This could mean using the DRAM from one or more other GPUs, because one GPU doesn’t have enough memory to store them all. After they’re downloaded to the processor, it performs its computations and sends the result to another memory location for further processing.

Aside from eliminating the von Neumann bottleneck, one solution includes closing that distance. “The entire industry is working to try to improve data localization,” Tsai said. IBM Research scientists recently announced such an approach: a polymer optical waveguide for co-packaged optics. This module brings the speed and bandwidth density of fiber optics to the edge of chips, supercharging their connectivity and hugely reducing model training time and energy costs.

With currently available hardware, though, the result of all these data transfers is that training an LLM can easily take months, consuming more energy than a typical US home does in that time. And AI doesn’t stop needing energy after model training. Inferencing has similar computational requirements, meaning that the von Neumann bottleneck slows it down in a similar fashion.

a. In a conventional computing system, when an operation f is performed on data D, D has to be moved into a processing unit, leading to significant costs in latency and energy. b. In the case of in-memory computing, f(D) is performed within a computational memory unit by exploiting the physical attributes of the memory devices, thus obviating the need to move D to the processing unit. The computational tasks are performed within the confines of the memory array and its peripheral circuitry, albeit without deciphering the content of the individual memory elements. Both charge-based memory technologies, such as SRAM, DRAM, and flash memory, and resistance-based memory technologies, such as RRAM, PCM, and STT-MRAM, can serve as elements of such a computational memory unit. Source: Nature Nanotechnology

Getting around the bottleneck

For the most parts, model weights are stationary, and AI computing is memory-centric, rather than compute heavy, said Le Gallo-Bourdeau. “You have a fixed set of synaptic weights, and you just need to propagate data through them.”

This quality has enabled him and his colleagues to pursue analog in-memory computing, which integrates memory with processing, using the laws of physics to store weights. One of these approaches is phase-change memory (PCM), which stores model weights in the resistivity of a chalcogenide glass, which is changed by applying an electrical current.

“This way we can reduce the energy that is spent in data transfers and mitigate the von Neumann bottleneck,” said Le Gallo-Bourdeau. In-memory computing isn’t the only way to work around the von Neumann bottleneck, though.

The AIU NorthPole is a processor that stores memory in digital SRAM, and while its memory isn’t intertwined with compute in the same way as analog chips, its numerous cores each has access to local memory — making it an extreme example of near-memory computing. Experiments have already demonstrated the power and promise of this architecture. In recent inference tests run on a 3-billion-parameter LLM developed from IBM’s Granite-8B-Code-Base model, NorthPole was 47 times faster than the next most energy-efficient GPU and was 73 times more energy efficient than the next lowest latency GPU.

It’s also important to note that models trained on von Neumann hardware can be run on non-von Neumann devices. In fact, for analog in-memory computing, it’s essential. PCM devices aren’t durable enough to have their weights changed over and over, so they’re used to deploy models that have been trained on conventional GPUs. Durability is a comparative advantage of SRAM memory in near-memory or in-memory computing, as it can be rewritten infinitely.

Why von Neumann computing isn’t going away

While von Neumann architecture creates a bottleneck for AI computing, for other applications, it’s perfectly suited. Sure, it causes issues in model training and inference, but von Neumann architecture is perfect for processing computer graphics or other compute-heavy processes. And when 32- or 64-bit floating point precision is called for, the low precision of in-memory computing isn’t up to the task.

“For general purpose computing, there’s really nothing more powerful than the von Neumann architecture,” said Burr. Under these circumstances, bytes are either operations or operands that are moving on a bus from a memory to a processor. “Just like an all-purpose deli where somebody might order some salami or pepperoni or this or that, but you’re able to switch between them because you have the right ingredients on hand, and you can easily make six sandwiches in a row.” Special-purpose computing, on the other hand, may involve 5,000 tuna sandwiches for one order — like AI computing as it shuttles static model weights.

Even when building their in-memory AIU chips, IBM Researchers include some conventional hardware for the necessary high-precision operations.

Even as scientists and engineers work on new ways to eliminate the von Neumann bottleneck, experts agree that the future will likely include both hardware architectures, said Le Gallo-Bourdeau. “What makes sense is some mix of von Neumann and non-von Neumann processors to each handle the operations they are best at.”

Source: zedreviews.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Artificial Intelligence
  • Computing
  • von Neumann computing
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
Microsoft’s Majorana 1 chip carves new path for quantum computing
View Post
  • Computing
  • Technology

Microsoft’s Majorana 1 chip carves new path for quantum computing

  • February 19, 2025
CES 2025: Intel Shows Off Its AI Tech
View Post
  • Computing
  • Technology

CES 2025: Intel Shows Off Its AI Tech

  • January 23, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.