aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Cloud-Native

GPU Partitioning: Fair Share Scheduling

  • aster.cloud
  • July 20, 2022
  • 3 minute read

The GPU computation is asynchronous to the POD itself. Typically, the process running on the POD copies data to the GPU memory and issues a CUDA instruction to the GPU to execute the calculation (known as GPU kernel). When the GPU kernel finishes with the computation, it will issue a sync request to wake up the POD and copies the computation results back to the main memory.

GPU Kernels are non-preemptable and cannot be interrupted. Therefore, even after GPU partitioning, the actual amount of GPU usage by each POD is still unpredictable, which may still lead to underutilization, or performance delays. For this reason, we need to implement collaboration between the scheduler front-end, a device manager, and a scheduler backend to achieve fair share scheduling.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Fair share scheduling for Gemini GPU Partitioning
Figure 5 Fair share scheduling for Gemini GPU Partitioning

Figure 5 shows how the Gemini scheduler achieves fair share scheduling for ML workload. It consists of an event-driven monitoring subsystem (#1) to collect the GPU utilization for the Device Manager. The Gemini scheduler will calculate on a real-time basis the next POD that should be scheduled. There are 2 pieces of information the scheduler needs to calculate:

  1. The POD that’s currently furthest away from its target GPU % utilization.
  2. The amount of time this POD should be given to run on the target GPU.

This information is encoded in a token and dispatched to the target worker node (#2 & #3). As these processes reiterates, the PODs should be getting closer to its target GPU quota. In the case a POD exceeded its quota, the token will be revoked (#4) and the POD will not be eligible to be scheduled.

Read More  How To Get Started With Event-Driven Microservices

We will briefly explain each of these 3 subsystems:

Event driven Monitoring

  • As mentioned above, GPU kernels are not preemptable. In order to capture and measure runtime kernel execution behavior without introducing synchronization points between the CPU and GPU, our event driven monitoring subsystem piggybacks the SYNC event issued by the GPU kernel to record the amount of GPU time used by a sharePOD and stores the utilization statistics with the backend device manager.
  • The goal of the monitor is to identify kernel bursts from applications, and correctly record their actual start time and end time for execution

Token-based time-sharing scheduler

  • Once a sharePOD’s GPU kernel is completed, the physical GPU becomes available. Our backend must then schedule the next sharePOD to run on the worker node of the corresponding GPU.
  • We implemented a dynamic quota strategy based on the estimated kernel burst time to adapt to dynamic workload patterns. Our approach is to let the API hook provide some statistics of the kernel burst of its client to the scheduler. Then the scheduler uses a smooth function to gradually adjust the token quota of a client according to its estimated burst time from the client
  • The target physical GPU and the dynamic quota is embedded in a token and dispatched to the corresponding worker node. Thus, the token serves as a mechanism for the front-end load distribution and fair sharing of GPU resources by the sharePODs.

Token Revocation

  • To minimize context switch and interruptions, we allow each sharePOD to execute multiple GPU kernels. However, because GPU kernels are non-preemptable, we want to prevent runaway sharePODs. The token revocation scheme is designed to avoid non-preemptive kernels from exceeding their scheduling time quota. A token becomes invalid when its quota is expired, and its hook library must request a new token from the scheduler for future kernel execution.
Read More  ZTNA Is Dead. Long Live Zero Trust.

Summary

In summary, we have explained how we customize the default Kube scheduler to allow a physical GPU to be shared by multiple POD’s and how we collect their GPU utilization to dynamically adjust the time slice we allocate to the PODs running ML workload.

K8s Scheduler Series Reference

  • Kubernetes worker nodes
  • Kube scheduler Framework
  • Creating a Kube schedule plugin
  • Sample Scheduler framework Plugins
  • Gang Scheduling
  • Capacity Scheduling
  • GPU Binpacking

Gemini Open Cloud is a CNCF member and a CNCF-certified Kubernetes service provider. With more than ten years of experience in cloud technology, Gemini Open Cloud is an early leader in cloud technology in Taiwan.

 

 

Guest post originally published on the Gemini Open Cloud blog by Patrick Fu, CEO of Gemini Open Cloud
Source CNCF


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • GPU
  • K8s
  • Kubernetes
  • POD
You May Also Like
View Post
  • Cloud-Native
  • Multi-Cloud

Oracle Expands Multicloud Capabilities with AWS, Google Cloud, and Microsoft Azure

  • September 11, 2024
Cloud computing concept image double exposure Digitally Enhanced Smart City Concept with Cloud Computing
View Post
  • Cloud-Native
  • Computing
  • Hybrid Cloud
  • Multi-Cloud
  • Public Cloud

Make Your Business Resilient By Integrating These Best Practices Into Your Cloud Architecture

  • July 29, 2024
Huawei Cloud Cairo Region Goes Live
View Post
  • Cloud-Native
  • Computing
  • Platforms

Huawei Cloud Goes Live in Egypt

  • May 24, 2024
View Post
  • Cloud-Native
  • Computing
  • Engineering

10 Cloud Development Gotchas To Watch Out For

  • March 29, 2024
Storage Ceph
View Post
  • Cloud-Native
  • Data

The Growth Of IBM Storage Ceph – The Ideal Foundation For A Modern Data Lakehouse

  • January 30, 2024
Clouds
View Post
  • Cloud-Native
  • Platforms
  • Software Engineering

Microsoft Releases Azure Migrate Assessment Tool For .NET Application

  • January 14, 2024
View Post
  • Cloud-Native
  • Engineering
  • Platforms

Top Highlights From AWS Worldwide Public Sector Partners At Re:Invent 2023

  • December 27, 2023
View Post
  • Cloud-Native
  • Computing

Supercharging IBM’s Cloud-Native AI Supercomputer

  • December 24, 2023

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • oracle-ibm 3
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • 4
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 5
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 6
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 7
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 8
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 9
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 10
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
  • 2
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 3
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 4
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 5
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.