aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Cloud-Native
  • Programming

Volcano Releases v1.6.0

  • aster.cloud
  • June 19, 2022
  • 4 minute read

CNCF Volcano 1.6.0 is now available with new features such as elastic job management, dynamic scheduling and rescheduling based on actual resource utilization, and MPI job plugin.

Volcano v1.6.0 is now available

Volcano is the  first cloud native batch computing project in CNCF. It was open sourced at Shanghai KubeCon in June 2019 and accepted as a CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubation project. By now, more than 400 global developers have committed code to the project. The community is growing popularity among developers, partners, and users.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Key Features

Scheduling for elastic job

This feature, working with Volcano Jobs or PyTorch Jobs, accelerates AI training and big data analytics and reduces costs by using spot instances on the cloud.

The number of replicas allowed for an elastic job falls within [min, max]. min corresponds to minAvailable of the job, and max indicates the number of replicas of the job. The elastic scheduling module preferentially allocates resources to the minAvailable pods to ensure that their minimum resource requests are met.

Resources, when idle, will be allocated by the scheduler to elastic pods to accelerate computing. However, when the cluster is resource-starved, the scheduler preferentially preempts the resources of elastic pods, which triggers scale-in. The scheduler also balances resource allocation based on priorities. For example, a high-priority job can preempt resources of an elastic pod of a low-priority job.

Job 1-1

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/elastic-scheduler.md

Issue:

https://github.com/volcano-sh/volcano/issues/1876

Dynamic scheduling

The current scheduling mechanism, based on resource request and allocation, may cause unbalanced node resource utilization. For example, a pod may be scheduled to a node with a extremely high resource usage and cause a node exception, while there are some other nodes in the cluster that are not heavily used

In version 1.6.0, Volcano collaborates with Prometheus to make scheduling decisions. Prometheus collects data about cluster node resource use, and Volcano uses this data to balance node resource usage as much as possible. You can also configure the limits of CPUs and memory of each node. This prevents node exceptions caused by pods using too many resources.

Example scheduling policy:

actions<span class="token operator">:</span> <span class="token string">"enqueue, allocate, backfill"</span>  
tiers<span class="token operator">:</span>
  <span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> priority
      <span class="token operator">-</span> name<span class="token operator">:</span> gang
      <span class="token operator">-</span> name<span class="token operator">:</span> conformance
      <span class="token operator">-</span> name<span class="token operator">:</span> usage  # usage based scheduling plugin
        arguments<span class="token operator">:</span>
          thresholds<span class="token operator">:</span>
            CPUUsageAvg<span class="token punctuation">.</span><span class="token number">5</span>m<span class="token operator">:</span> <span class="token number">90</span> # The node whose average usage in <span class="token number">5</span> minute is higher than <span class="token number">90</span><span class="token operator">%</span> will be filtered in predicating stage
            MEMUsageAvg<span class="token punctuation">.</span><span class="token number">5</span>m<span class="token operator">:</span> <span class="token number">80</span> # The node whose average usage in <span class="token number">5</span> minute is higher than <span class="token number">80</span><span class="token operator">%</span> will be filtered in predicating stage
  <span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> overcommit
      <span class="token operator">-</span> name<span class="token operator">:</span> drf
      <span class="token operator">-</span> name<span class="token operator">:</span> predicates
      <span class="token operator">-</span> name<span class="token operator">:</span> proportion
      <span class="token operator">-</span> name<span class="token operator">:</span> nodeorder
      <span class="token operator">-</span> name<span class="token operator">:</span> binpack
metrics<span class="token operator">:</span>                         # Metrics Server<span class="token operator">-</span>related configuration
  address<span class="token operator">:</span> http<span class="token operator">:</span><span class="token comment">//192.168.0.10:9090  # (mandatory) Prometheus server address</span>
  interval<span class="token operator">:</span> <span class="token number">30</span>s                    # <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> The scheduler pulls metrics from Prometheus with <span class="token keyword">this</span> interval<span class="token punctuation">.</span> <span class="token number">5</span>s by <span class="token keyword">default</span><span class="token punctuation">.</span>

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/usage-based-scheduling.md

Read More  Google I/O 2019 | What’s New with Chrome and the Web

Issue:

https://github.com/volcano-sh/volcano/issues/1777

Rescheduling

Improper scheduling policies and dynamic job lifecycles lead to unbalanced node resource utilization. In version 1.6.0, Volcano allows you to add rescheduling policies based on the actual resource utilization or custom metrics. Pods will be evicted from some high-load nodes to low-load nodes, and the resource utilization of all nodes will be periodically checked.

Rescheduling further balances the loads of each node and improves the cluster resource utilization.

## Configuration Option actions: “enqueue, allocate, backfill, shuffle”  ## Add ‘shuffle’ at the end of the actions tiers:

<span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> priority
      <span class="token operator">-</span> name<span class="token operator">:</span> gang
      <span class="token operator">-</span> name<span class="token operator">:</span> conformance
      <span class="token operator">-</span> name<span class="token operator">:</span> rescheduling       ## Rescheduling plugin
        arguments<span class="token operator">:</span>
          interval<span class="token operator">:</span> <span class="token number">5</span>m           ## <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> The strategies will be called in <span class="token keyword">this</span> duration periodically<span class="token punctuation">.</span> <span class="token number">5</span> minutes by <span class="token keyword">default</span><span class="token punctuation">.</span> 
          strategies<span class="token operator">:</span>            ## <span class="token punctuation">(</span>mandatory<span class="token punctuation">)</span> The strategies work in order<span class="token punctuation">.</span>
            <span class="token operator">-</span> name<span class="token operator">:</span> offlineOnly
            <span class="token operator">-</span> name<span class="token operator">:</span> lowPriorityFirst
            <span class="token operator">-</span> name<span class="token operator">:</span> lowNodeUtilization
              params<span class="token operator">:</span>
                thresholds<span class="token operator">:</span>
                  <span class="token string">"cpu"</span> <span class="token operator">:</span> <span class="token number">20</span>
                  <span class="token string">"memory"</span><span class="token operator">:</span> <span class="token number">20</span>
                  <span class="token string">"pods"</span><span class="token operator">:</span> <span class="token number">20</span>
                targetThresholds<span class="token operator">:</span>
                  <span class="token string">"cpu"</span> <span class="token operator">:</span> <span class="token number">50</span>
                  <span class="token string">"memory"</span><span class="token operator">:</span> <span class="token number">50</span>
                  <span class="token string">"pods"</span><span class="token operator">:</span> <span class="token number">50</span>
          queueSelector<span class="token operator">:</span>         ## <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> Select workloads in specified queues as potential evictees<span class="token punctuation">.</span> All queues by <span class="token keyword">default</span><span class="token punctuation">.</span>
            <span class="token operator">-</span> <span class="token keyword">default</span>
            <span class="token operator">-</span> test<span class="token operator">-</span>queue
          labelSelector<span class="token operator">:</span>         ## <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> Select workloads with specified labels as potential evictees<span class="token punctuation">.</span> All labels by <span class="token keyword">default</span><span class="token punctuation">.</span>
            business<span class="token operator">:</span> offline
            team<span class="token operator">:</span> test
  <span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> overcommit
      <span class="token operator">-</span> name<span class="token operator">:</span> drf
      <span class="token operator">-</span> name<span class="token operator">:</span> predicates
      <span class="token operator">-</span> name<span class="token operator">:</span> proportion
      <span class="token operator">-</span> name<span class="token operator">:</span> nodeorder
      <span class="token operator">-</span> name<span class="token operator">:</span> binpack

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/rescheduling.md

Read More  Accelerating Digital Transformation: What Every CEO Needs To Know About Software Delivery Automation

Issue:

https://github.com/volcano-sh/volcano/issues/1777

MPI plugin

You can use Volcano Jobs to run MPI jobs. Volcano Job build-in plugins such as svc, env, and ssh automatically configure password-free communications and environment variable injection for the masters and workers of MPI jobs.

The new version of Volcano further eases your running of MPI jobs by providing the MPI plugin. No more worries about the shell syntax, the communications between masters and workers, or manual SSH authentication. You can start an MPI job in a simple and graceful manner.

Example configuration:

apiVersion<span class="token operator">:</span> batch<span class="token punctuation">.</span>volcano<span class="token punctuation">.</span>sh<span class="token operator">/</span>v1alpha1
kind<span class="token operator">:</span> Job
metadata<span class="token operator">:</span>
  name<span class="token operator">:</span> lm<span class="token operator">-</span>mpi<span class="token operator">-</span>job
spec<span class="token operator">:</span>
  minAvailable<span class="token operator">:</span> <span class="token number">1</span>
  schedulerName<span class="token operator">:</span> volcano
  plugins<span class="token operator">:</span>
    mpi<span class="token operator">:</span> <span class="token punctuation">[</span><span class="token string">"--master=mpimaster"</span><span class="token punctuation">,</span><span class="token string">"--worker=mpiworker"</span><span class="token punctuation">,</span><span class="token string">"--port=22"</span><span class="token punctuation">]</span>  ## MPI plugin <span class="token keyword">register</span>
  tasks<span class="token operator">:</span>
    <span class="token operator">-</span> replicas<span class="token operator">:</span> <span class="token number">1</span>
      name<span class="token operator">:</span> mpimaster
      policies<span class="token operator">:</span>
        <span class="token operator">-</span> event<span class="token operator">:</span> TaskCompleted
          action<span class="token operator">:</span> CompleteJob
      <span class="token keyword">template</span><span class="token operator">:</span>
        spec<span class="token operator">:</span>
          containers<span class="token operator">:</span>
            <span class="token operator">-</span> command<span class="token operator">:</span>
                <span class="token operator">-</span> <span class="token operator">/</span>bin<span class="token operator">/</span>sh
                <span class="token operator">-</span> <span class="token operator">-</span>c
                <span class="token operator">-</span> <span class="token operator">|</span>
                  mkdir <span class="token operator">-</span>p <span class="token operator">/</span>var<span class="token operator">/</span>run<span class="token operator">/</span>sshd<span class="token punctuation">;</span> <span class="token operator">/</span>usr<span class="token operator">/</span>sbin<span class="token operator">/</span>sshd<span class="token punctuation">;</span>
                  mpiexec <span class="token operator">--</span>allow<span class="token operator">-</span>run<span class="token operator">-</span>as<span class="token operator">-</span>root <span class="token operator">--</span>host $<span class="token punctuation">{</span>MPI_HOST<span class="token punctuation">}</span> <span class="token operator">-</span>np <span class="token number">2</span> mpi_hello_world<span class="token punctuation">;</span>
              image<span class="token operator">:</span> volcanosh<span class="token operator">/</span>example<span class="token operator">-</span>mpi<span class="token operator">:</span><span class="token number">0.0</span><span class="token punctuation">.</span><span class="token number">1</span>
              name<span class="token operator">:</span> mpimaster
              workingDir<span class="token operator">:</span> <span class="token operator">/</span>home
          restartPolicy<span class="token operator">:</span> OnFailure
    <span class="token operator">-</span> replicas<span class="token operator">:</span> <span class="token number">2</span>
      name<span class="token operator">:</span> mpiworker
      <span class="token keyword">template</span><span class="token operator">:</span>
        spec<span class="token operator">:</span>
          containers<span class="token operator">:</span>
            <span class="token operator">-</span> command<span class="token operator">:</span>
                <span class="token operator">-</span> <span class="token operator">/</span>bin<span class="token operator">/</span>sh
                <span class="token operator">-</span> <span class="token operator">-</span>c
                <span class="token operator">-</span> <span class="token operator">|</span>
                  mkdir <span class="token operator">-</span>p <span class="token operator">/</span>var<span class="token operator">/</span>run<span class="token operator">/</span>sshd<span class="token punctuation">;</span> <span class="token operator">/</span>usr<span class="token operator">/</span>sbin<span class="token operator">/</span>sshd <span class="token operator">-</span>D<span class="token punctuation">;</span>
              image<span class="token operator">:</span> volcanosh<span class="token operator">/</span>example<span class="token operator">-</span>mpi<span class="token operator">:</span><span class="token number">0.0</span><span class="token punctuation">.</span><span class="token number">1</span>
              name<span class="token operator">:</span> mpiworker
              workingDir<span class="token operator">:</span> <span class="token operator">/</span>home
          restartPolicy<span class="token operator">:</span> OnFailure

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md

Read More  Compute And Storage Should Be Decoupled For Log Management At Scale

Issue:

https://github.com/volcano-sh/volcano/pull/2194

Links:

Release note: https://github.com/volcano-sh/volcano/releases/tag/v1.6.0

Branch: https://github.com/volcano-sh/volcano/tree/release-1.6

About Volcano

Website: https://volcano.sh

Github: https://github.com/volcano-sh/volcano

Volcano is designed for high-performance batch computing such as AI, big data, gene sequencing, and rendering jobs. The project has got more than 2400 Stars and 550 Forks on GitHub. 26,000 developers around the world join the community. Contributing enterprises include Huawei, AWS, Baidu, Tencent, JD.com, and Xiaohongshu.

Volcano supports mainstream computing frameworks, including Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, and KubeGene. A comprehensive, robust ecosystem has been developed.

 

 

Project post by Volcano project maintainers
Source CNCF


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • CNCF
  • CNCF Volcano 1.6.0
  • Volcano
  • Volcano 1.6.0
You May Also Like
View Post
  • Cloud-Native
  • Multi-Cloud

Oracle Expands Multicloud Capabilities with AWS, Google Cloud, and Microsoft Azure

  • September 11, 2024
Cloud computing concept image double exposure Digitally Enhanced Smart City Concept with Cloud Computing
View Post
  • Cloud-Native
  • Computing
  • Hybrid Cloud
  • Multi-Cloud
  • Public Cloud

Make Your Business Resilient By Integrating These Best Practices Into Your Cloud Architecture

  • July 29, 2024
Huawei Cloud Cairo Region Goes Live
View Post
  • Cloud-Native
  • Computing
  • Platforms

Huawei Cloud Goes Live in Egypt

  • May 24, 2024
View Post
  • Cloud-Native
  • Computing
  • Engineering

10 Cloud Development Gotchas To Watch Out For

  • March 29, 2024
Storage Ceph
View Post
  • Cloud-Native
  • Data

The Growth Of IBM Storage Ceph – The Ideal Foundation For A Modern Data Lakehouse

  • January 30, 2024
Clouds
View Post
  • Cloud-Native
  • Platforms
  • Software Engineering

Microsoft Releases Azure Migrate Assessment Tool For .NET Application

  • January 14, 2024
View Post
  • Cloud-Native
  • Engineering
  • Platforms

Top Highlights From AWS Worldwide Public Sector Partners At Re:Invent 2023

  • December 27, 2023
View Post
  • Cloud-Native
  • Computing

Supercharging IBM’s Cloud-Native AI Supercomputer

  • December 24, 2023

Stay Connected!
LATEST
  • Camping 1
    The Summer Adventures : Camping Essentials
    • June 27, 2025
  • Host a static website on AWS with Amazon S3 and Route 53
    • June 27, 2025
  • Prioritize security from the edge to the cloud
    • June 25, 2025
  • 6 edge monitoring best practices in the cloud
    • June 25, 2025
  • Genome 5
    AlphaGenome: AI for better understanding the genome
    • June 25, 2025
  • 6
    Pure Accelerate 2025: All the news and updates live from Las Vegas
    • June 18, 2025
  • 7
    ‘This was a very purposeful strategy’: Pure Storage unveils Enterprise Data Cloud in bid to unify data storage, management
    • June 18, 2025
  • What is cloud bursting?
    • June 18, 2025
  • 9
    There’s a ‘cloud reset’ underway, and VMware Cloud Foundation 9.0 is a chance for Broadcom to pounce on it
    • June 17, 2025
  • What is confidential computing?
    • June 17, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Oracle adds xAI Grok models to OCI
    • June 17, 2025
  • Fine-tune your storage-as-a-service approach
    • June 16, 2025
  • 3
    Advanced audio dialog and generation with Gemini 2.5
    • June 15, 2025
  • Google Cloud, Cloudflare struck by widespread outages
    • June 12, 2025
  • 5
    Global cloud spending might be booming, but AWS is trailing Microsoft and Google
    • June 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.