aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Public Cloud
  • Software Engineering

Colocated VMs Get In Each Other’s Way

  • aster.cloud
  • July 1, 2022
  • 5 minute read

TL;DR:

  • Cloud providers can place multiple VMs of the same cloud customer on a shared physical host – a situation that is difficult to detect with conventional means, but readily apparent with Clockwork’s high precision measurements
  • VM colocation has performance implications that are much more severe than simple shared tenancy
  • When three or more VMs share the same physical host, the available network bandwidth per VM is impaired

When workloads move to the cloud, the cloud operators use proprietary placement algorithms to map the requested virtual resources onto specific physical hardware in their data centers. While the cloud customer has some coarse control over placement (for example by configuring placement groups), the final placement is controlled by the provider and chosen to maximize their operational objectives.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Gif - colocation is opaque

This means that some virtual machines will end up running on the same physical host, making them next-door neighbors in the cloud. This VM colocation is invisible to cloud customers, creating the impression that all VMs are equal. With Clockwork’s high-accuracy clock sync technology, it becomes possible to infer VM colocation, thus revealing which VMs are running on the same physical host.

How does VM colocation affect the performance of the cloud system? While cloud providers isolate CPU and memory resources very well between VMs, the networking resources are more likely to be oversubscribed. Clockwork offers Latency Sensei, a service that measures your cloud system’s performance, including the impact of VM colocation.

In this two-part blog post series, we’ll take a deep look at the impact of VM colocation on key network performance metrics. We start with network bandwidth in this post, then in our next one, we will see how VM colocation affects other network performance metrics.

Read More  Using Google Kubernetes Engine’s GPU Sharing To Search For Neutrinos

The results are the aggregate of thousands of Latency Sensei audits on 50-node Kubernetes clusters in Amazon Web Services (EKS), Google Cloud Platform (GCE), and Microsoft Azure (AKS).

Network Bandwidth

It is well-known that the cloud is fundamentally a shared tenancy environment, and some fair resource sharing is expected. But there is a key difference between next-door neighbors that belong to other cloud customers (shared tenancy) versus the same cloud customer (VM colocation).

When next-door neighbors belong to different cloud customers, the cloud typically works well. This is because the VMs owned by different customers often don’t need the exact same resource at the same time. Each VM’s load peaks at different points in time, and noisy neighbors average out.

In contrast, when your next-door neighbors are your own VMs, they are serving the same workload and competing for the same resources at the same time. Their peak load is very likely to occur at the same point in time. Is the cloud able to provide maximal performance simultaneously?

Share tenancy vs. colocation gif

During a Clockwork Latency Sensei audit, the bandwidth of each VM’s network connection is measured by exchanging long flows of synthetic data among the VMs at maximum speed, thus saturating the virtual network links. This traffic pattern is similar to realistic workloads, for example during the broadcast phase of large-scale machine learning model training.

At each VM, cloud providers apply a rate limiter in the virtual network interface cards (vNICs) to implement the promised virtual link speed for each VM while attempting fair sharing of the physical network resources between VMs. But if a cluster on the cloud includes highly colocated VMs, the throughput is instead limited by the capabilities of the physical NIC that underlies the virtualization.

Read More  Which Kubernetes Certification Is Right For You?

For example, consider this case of a 50-node cluster on Google Cloud Platform in the us-east4-c region with n1-standard-4 VMs (see full audit report).

The bubble diagram shows which VMs are hosted on the same physical machine.

The bubble diagram shows which VMs are hosted on the same physical machine.

The bar chart shows the egress bandwidth of each of the VMs.

The bar chart shows the egress bandwidth of each of the VMs. 

For most non-colocated VMs, the nominal bandwidth of 10Gbps is actually achieved (purple bars). For colocated VMs, the bandwidth is limited severely below the promised amount (red bars). The group of 7 VMs that share a physical host achieve only 2.5Gbps per host.

We have run bandwidth measurements for thousands of Kubernetes clusters in Google Cloud Platform (GKE), Amazon Web Services (EKS), and Microsoft Azure (AKS). Let’s see how severe this effect is on average over many examples of cloud clusters.

Google Cloud Platform (GKE)

For VM types with nominal bandwidth of 10 Gbps on Google Cloud, bandwidth limitations start to appear with as little as 3 colocated VMs, and get progressively more severe as 4, 5 or even 6 VMs are hosted on the same machine.

For n1-standard-4 VMs on Google cloud platform, bandwidth limitations appear when three or more VMs are hosted on the same physical machine.
For n1-standard-4 VMs on Google cloud platform, bandwidth limitations appear when three or more VMs are hosted on the same physical machine.
For n2-standard-4 VMs on Google cloud platform, bandwidth limitations kick in at slightly higher levels of VM colocation.
For n2-standard-4 VMs on Google cloud platform, bandwidth limitations kick in at slightly higher levels of VM colocation.

Amazon Web Services (EKS)

By policy, compute instances in AWS EC2 are purposefully placed on separate hardware whenever possible to minimize the probability of simultaneous failures [Link]. This means colocation is rarer than in the other clouds. In fact, for m5 family compute instances, only 0.3% of 600+ test clusters we ran with 50 VMs each included a physical machine hosting four or more VMs. Typically, no more than three VMs share a physical host, and these VMs do not incur bandwidth loss. They all achieve their nominal bandwidth of 10Gbps.

Read More  Zen And The Art Of Application Dashboards
AWS’s m5.xlarge instances are usually not packed aggressively, and do not suffer bandwidth loss due to colocation.
AWS’s m5.xlarge instances are usually not packed aggressively, and do not suffer bandwidth loss due to colocation.

The m4.xlarge  EC2 instance type severely limits bandwidth to about 770 Mbps. AWS places these instances slightly more aggressively than m5, but the relatively low bandwidth baseline is not further impaired due to colocation.

Microsoft Azure (AKS)

In Azure clusters, the per-VM bandwidth is limited below the nominal value when five or more VMs are hosted by the same physical machine. On the plus side, Azure allocates larger bandwidth much beyond the nominal value to VMs that run during times of low overall cloud load.

For the Standard_D4s_v4 instance type on Microsoft Azure, colocation has a sizable effect if five or more VMs are hosted by the same physical machine. The bandwidth degradation is less severe than in Google Cloud.
For the Standard_D4s_v4 instance type on Microsoft Azure, colocation has a sizable effect if five or more VMs are hosted by the same physical machine. The bandwidth degradation is less severe than in Google Cloud.
During times of low aggregate load, Microsoft Azure’s networking stack allows VMs to go faster than their nominal bandwidth. In this case, 16 out of 50 VMs of type Standard_D4s_v2 achieve significantly higher bandwidth than the nominal 10Gbps (see full audit report).
During times of low aggregate load, Microsoft Azure’s networking stack allows VMs to go faster than their nominal bandwidth. In this case, 16 out of 50 VMs of type Standard_D4s_v2 achieve significantly higher bandwidth than the nominal 10Gbps (see full audit report).

See for yourself

Colocation affects the network performance of cloud-based VMs. When VMs are too densely packed with three or more VMs on the same physical host, the available network bandwidth is significantly lower than the nominal value that you are paying for.

Clockwork Latency Sensei provides visibility into VM colocation and its impact on network bandwidth. Browse the Latency Sensei Audit report gallery to see example reports, and download Latency Sensei to run audit reports on your own cloud deployment.

How does VM colocation affect other network performance metrics? Read our follow-up blogpost to find out more.

Interested in solving challenging engineering problems and building the platform that powers the next generation time-sensitive application? Join our world-class engineering team.

 

 

Guest post originally published on the Clockwork blog
Source CNCF


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • CNCF
  • GKE
  • Latency Sensei
  • VM
You May Also Like
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
View Post
  • Software Engineering
  • Technology

Claude 3.7 Sonnet and Claude Code

  • February 25, 2025
DeepSeek R1 is now available on Azure AI Foundry and GitHub
View Post
  • Public Cloud
  • Technology

DeepSeek R1 is now available on Azure AI Foundry and GitHub

  • February 2, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
Cloud platforms among the clouds
View Post
  • Computing
  • Learning
  • Public Cloud

Best Cloud Platforms Offering Free Trials for Cloud Mastery

  • December 23, 2024
Vehicle Manufacturing
View Post
  • Hybrid Cloud
  • Public Cloud

Toyota shifts into overdrive: Developing an AI platform for enhanced manufacturing efficiency

  • December 10, 2024
IBM and AWS
View Post
  • Public Cloud

IBM and AWS Accelerate Partnership to Scale Responsible Generative AI

  • December 2, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.