aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Cloud-Native

GPU Partitioning: Fair Share Scheduling

  • aster_cloud
  • July 20, 2022
  • 3 minute read

The GPU computation is asynchronous to the POD itself. Typically, the process running on the POD copies data to the GPU memory and issues a CUDA instruction to the GPU to execute the calculation (known as GPU kernel). When the GPU kernel finishes with the computation, it will issue a sync request to wake up the POD and copies the computation results back to the main memory.

GPU Kernels are non-preemptable and cannot be interrupted. Therefore, even after GPU partitioning, the actual amount of GPU usage by each POD is still unpredictable, which may still lead to underutilization, or performance delays. For this reason, we need to implement collaboration between the scheduler front-end, a device manager, and a scheduler backend to achieve fair share scheduling.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Fair share scheduling for Gemini GPU Partitioning
Figure 5 Fair share scheduling for Gemini GPU Partitioning

Figure 5 shows how the Gemini scheduler achieves fair share scheduling for ML workload. It consists of an event-driven monitoring subsystem (#1) to collect the GPU utilization for the Device Manager. The Gemini scheduler will calculate on a real-time basis the next POD that should be scheduled. There are 2 pieces of information the scheduler needs to calculate:

  1. The POD that’s currently furthest away from its target GPU % utilization.
  2. The amount of time this POD should be given to run on the target GPU.

This information is encoded in a token and dispatched to the target worker node (#2 & #3). As these processes reiterates, the PODs should be getting closer to its target GPU quota. In the case a POD exceeded its quota, the token will be revoked (#4) and the POD will not be eligible to be scheduled.

Read More  Red Hat Puts Podman Container Management On The Desktop

We will briefly explain each of these 3 subsystems:

Event driven Monitoring

  • As mentioned above, GPU kernels are not preemptable. In order to capture and measure runtime kernel execution behavior without introducing synchronization points between the CPU and GPU, our event driven monitoring subsystem piggybacks the SYNC event issued by the GPU kernel to record the amount of GPU time used by a sharePOD and stores the utilization statistics with the backend device manager.
  • The goal of the monitor is to identify kernel bursts from applications, and correctly record their actual start time and end time for execution

Token-based time-sharing scheduler

  • Once a sharePOD’s GPU kernel is completed, the physical GPU becomes available. Our backend must then schedule the next sharePOD to run on the worker node of the corresponding GPU.
  • We implemented a dynamic quota strategy based on the estimated kernel burst time to adapt to dynamic workload patterns. Our approach is to let the API hook provide some statistics of the kernel burst of its client to the scheduler. Then the scheduler uses a smooth function to gradually adjust the token quota of a client according to its estimated burst time from the client
  • The target physical GPU and the dynamic quota is embedded in a token and dispatched to the corresponding worker node. Thus, the token serves as a mechanism for the front-end load distribution and fair sharing of GPU resources by the sharePODs.

Token Revocation

  • To minimize context switch and interruptions, we allow each sharePOD to execute multiple GPU kernels. However, because GPU kernels are non-preemptable, we want to prevent runaway sharePODs. The token revocation scheme is designed to avoid non-preemptive kernels from exceeding their scheduling time quota. A token becomes invalid when its quota is expired, and its hook library must request a new token from the scheduler for future kernel execution.
Read More  What Is A Platform Orchestrator?

Summary

In summary, we have explained how we customize the default Kube scheduler to allow a physical GPU to be shared by multiple POD’s and how we collect their GPU utilization to dynamically adjust the time slice we allocate to the PODs running ML workload.

K8s Scheduler Series Reference

  • Kubernetes worker nodes
  • Kube scheduler Framework
  • Creating a Kube schedule plugin
  • Sample Scheduler framework Plugins
  • Gang Scheduling
  • Capacity Scheduling
  • GPU Binpacking

Gemini Open Cloud is a CNCF member and a CNCF-certified Kubernetes service provider. With more than ten years of experience in cloud technology, Gemini Open Cloud is an early leader in cloud technology in Taiwan.

 

 

Guest post originally published on the Gemini Open Cloud blog by Patrick Fu, CEO of Gemini Open Cloud
Source CNCF


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster_cloud

Related Topics
  • GPU
  • K8s
  • Kubernetes
  • POD
You May Also Like
Brush, Color, and Sketch pad
View Post
  • Cloud-Native
  • Design
  • Engineering

6 Security Best Practices For Cloud-Native Applications

  • November 17, 2023
AWS Graviton
View Post
  • Cloud-Native
  • Computing
  • Platforms

SAP HANA Cloud Now Supports AWS Graviton

  • November 7, 2023
Credit Card, Payment, and Internet
View Post
  • Cloud-Native
  • Public Cloud
  • Technology

Redis Cloud Gains Payment Card Industry Data Security Standard Certification

  • November 1, 2023
Cloud
View Post
  • Cloud-Native
  • Platforms

Microsoft Introduces Cloud-Native Application Platform

  • October 26, 2023
View Post
  • Cloud-Native
  • Platforms

Tencent Unveils Hunyuan, Its Proprietary Large Foundation Model On Tencent Cloud

  • September 7, 2023
Cloud
View Post
  • Cloud-Native
  • Computing
  • Platforms

InfluxData Announces InfluxDB Clustered to Deliver Time Series Analytics for On-Premises and Private Cloud Deployments

  • September 6, 2023
View Post
  • Cloud-Native
  • Computing
  • Engineering
  • Platforms

Farewell EC2-Classic, It’s Been Swell

  • September 4, 2023
Google Cloud Next 2023
View Post
  • Cloud-Native
  • Computing
  • Platforms

10 Must-Attend Sessions For Data Professionals At Google Cloud Next ‘23

  • August 23, 2023

Stay Connected!
LATEST
  • 1
    Kubernetes CRD Validation Using CEL
    • December 4, 2023
  • 2
    AI For Impact: How Google Cloud Is Bringing AI To Accelerate Climate Action
    • December 3, 2023
  • Birthday Cake 3
    How ChatGPT Altered Our World in Just One Year
    • November 30, 2023
  • OpenAI 4
    Sam Altman Returns As CEO, OpenAI Has A New Initial Board
    • November 30, 2023
  • Web 5
    Mastering the Art of Load Testing for Web Applications
    • November 29, 2023
  • Data center. Servers. 6
    Intel Granulate Optimizes Databricks Data Management Operations
    • November 27, 2023
  • Ubuntu. Chiselled containers. 7
    Canonical Announces The General Availability Of Chiselled Ubuntu Containers
    • November 25, 2023
  • Cyber Monday Sale. Guzz. Ideals collection. 8
    Decode Workweek Style with guzz
    • November 23, 2023
  • Guzz. Black Friday Specials. 9
    Art Meets Algorithm In Our Exclusive Shirt Collection!
    • November 23, 2023
  • Presents. Gifts. 10
    25 Besties Bargain Bags Below $100 This Black Friday 2023
    • November 22, 2023
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Oracle | Microsoft 1
    Oracle Cloud Infrastructure Utilized by Microsoft for Bing Conversational Search
    • November 7, 2023
  • Riyadh Air and IBM 2
    Riyadh Air And IBM Sign Collaboration Agreement To Establish Technology Foundation Of The Digitally Led Airline
    • November 6, 2023
  • Ingrasys 3
    Ingrasys Unveils Next-Gen AI And Cooling Solutions At Supercomputing 2023
    • November 15, 2023
  • Electronics 4
    Top 10+1 You Can’t Do Without For The Holidays: Electronics Edition.
    • November 20, 2023
  • Microsoft. Windows 5
    Ousted Sam Altman To Lead New Microsoft AI Team
    • November 20, 2023
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.