aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Cloud-Native
  • Programming

Volcano Releases v1.6.0

  • aster.cloud
  • June 19, 2022
  • 4 minute read

CNCF Volcano 1.6.0 is now available with new features such as elastic job management, dynamic scheduling and rescheduling based on actual resource utilization, and MPI job plugin.

Volcano v1.6.0 is now available

Volcano is the  first cloud native batch computing project in CNCF. It was open sourced at Shanghai KubeCon in June 2019 and accepted as a CNCF project in April 2020. In April 2022, Volcano was promoted to a CNCF incubation project. By now, more than 400 global developers have committed code to the project. The community is growing popularity among developers, partners, and users.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Key Features

Scheduling for elastic job

This feature, working with Volcano Jobs or PyTorch Jobs, accelerates AI training and big data analytics and reduces costs by using spot instances on the cloud.

The number of replicas allowed for an elastic job falls within [min, max]. min corresponds to minAvailable of the job, and max indicates the number of replicas of the job. The elastic scheduling module preferentially allocates resources to the minAvailable pods to ensure that their minimum resource requests are met.

Resources, when idle, will be allocated by the scheduler to elastic pods to accelerate computing. However, when the cluster is resource-starved, the scheduler preferentially preempts the resources of elastic pods, which triggers scale-in. The scheduler also balances resource allocation based on priorities. For example, a high-priority job can preempt resources of an elastic pod of a low-priority job.

Job 1-1

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/elastic-scheduler.md

Issue:

https://github.com/volcano-sh/volcano/issues/1876

Dynamic scheduling

The current scheduling mechanism, based on resource request and allocation, may cause unbalanced node resource utilization. For example, a pod may be scheduled to a node with a extremely high resource usage and cause a node exception, while there are some other nodes in the cluster that are not heavily used

In version 1.6.0, Volcano collaborates with Prometheus to make scheduling decisions. Prometheus collects data about cluster node resource use, and Volcano uses this data to balance node resource usage as much as possible. You can also configure the limits of CPUs and memory of each node. This prevents node exceptions caused by pods using too many resources.

Example scheduling policy:

actions<span class="token operator">:</span> <span class="token string">"enqueue, allocate, backfill"</span>  
tiers<span class="token operator">:</span>
  <span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> priority
      <span class="token operator">-</span> name<span class="token operator">:</span> gang
      <span class="token operator">-</span> name<span class="token operator">:</span> conformance
      <span class="token operator">-</span> name<span class="token operator">:</span> usage  # usage based scheduling plugin
        arguments<span class="token operator">:</span>
          thresholds<span class="token operator">:</span>
            CPUUsageAvg<span class="token punctuation">.</span><span class="token number">5</span>m<span class="token operator">:</span> <span class="token number">90</span> # The node whose average usage in <span class="token number">5</span> minute is higher than <span class="token number">90</span><span class="token operator">%</span> will be filtered in predicating stage
            MEMUsageAvg<span class="token punctuation">.</span><span class="token number">5</span>m<span class="token operator">:</span> <span class="token number">80</span> # The node whose average usage in <span class="token number">5</span> minute is higher than <span class="token number">80</span><span class="token operator">%</span> will be filtered in predicating stage
  <span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> overcommit
      <span class="token operator">-</span> name<span class="token operator">:</span> drf
      <span class="token operator">-</span> name<span class="token operator">:</span> predicates
      <span class="token operator">-</span> name<span class="token operator">:</span> proportion
      <span class="token operator">-</span> name<span class="token operator">:</span> nodeorder
      <span class="token operator">-</span> name<span class="token operator">:</span> binpack
metrics<span class="token operator">:</span>                         # Metrics Server<span class="token operator">-</span>related configuration
  address<span class="token operator">:</span> http<span class="token operator">:</span><span class="token comment">//192.168.0.10:9090  # (mandatory) Prometheus server address</span>
  interval<span class="token operator">:</span> <span class="token number">30</span>s                    # <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> The scheduler pulls metrics from Prometheus with <span class="token keyword">this</span> interval<span class="token punctuation">.</span> <span class="token number">5</span>s by <span class="token keyword">default</span><span class="token punctuation">.</span>

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/usage-based-scheduling.md

Read More  How To Run Containers And VMs Side-By-Side On Nutanix Karbon

Issue:

https://github.com/volcano-sh/volcano/issues/1777

Rescheduling

Improper scheduling policies and dynamic job lifecycles lead to unbalanced node resource utilization. In version 1.6.0, Volcano allows you to add rescheduling policies based on the actual resource utilization or custom metrics. Pods will be evicted from some high-load nodes to low-load nodes, and the resource utilization of all nodes will be periodically checked.

Rescheduling further balances the loads of each node and improves the cluster resource utilization.

## Configuration Option actions: “enqueue, allocate, backfill, shuffle”  ## Add ‘shuffle’ at the end of the actions tiers:

<span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> priority
      <span class="token operator">-</span> name<span class="token operator">:</span> gang
      <span class="token operator">-</span> name<span class="token operator">:</span> conformance
      <span class="token operator">-</span> name<span class="token operator">:</span> rescheduling       ## Rescheduling plugin
        arguments<span class="token operator">:</span>
          interval<span class="token operator">:</span> <span class="token number">5</span>m           ## <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> The strategies will be called in <span class="token keyword">this</span> duration periodically<span class="token punctuation">.</span> <span class="token number">5</span> minutes by <span class="token keyword">default</span><span class="token punctuation">.</span> 
          strategies<span class="token operator">:</span>            ## <span class="token punctuation">(</span>mandatory<span class="token punctuation">)</span> The strategies work in order<span class="token punctuation">.</span>
            <span class="token operator">-</span> name<span class="token operator">:</span> offlineOnly
            <span class="token operator">-</span> name<span class="token operator">:</span> lowPriorityFirst
            <span class="token operator">-</span> name<span class="token operator">:</span> lowNodeUtilization
              params<span class="token operator">:</span>
                thresholds<span class="token operator">:</span>
                  <span class="token string">"cpu"</span> <span class="token operator">:</span> <span class="token number">20</span>
                  <span class="token string">"memory"</span><span class="token operator">:</span> <span class="token number">20</span>
                  <span class="token string">"pods"</span><span class="token operator">:</span> <span class="token number">20</span>
                targetThresholds<span class="token operator">:</span>
                  <span class="token string">"cpu"</span> <span class="token operator">:</span> <span class="token number">50</span>
                  <span class="token string">"memory"</span><span class="token operator">:</span> <span class="token number">50</span>
                  <span class="token string">"pods"</span><span class="token operator">:</span> <span class="token number">50</span>
          queueSelector<span class="token operator">:</span>         ## <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> Select workloads in specified queues as potential evictees<span class="token punctuation">.</span> All queues by <span class="token keyword">default</span><span class="token punctuation">.</span>
            <span class="token operator">-</span> <span class="token keyword">default</span>
            <span class="token operator">-</span> test<span class="token operator">-</span>queue
          labelSelector<span class="token operator">:</span>         ## <span class="token punctuation">(</span>optional<span class="token punctuation">)</span> Select workloads with specified labels as potential evictees<span class="token punctuation">.</span> All labels by <span class="token keyword">default</span><span class="token punctuation">.</span>
            business<span class="token operator">:</span> offline
            team<span class="token operator">:</span> test
  <span class="token operator">-</span> plugins<span class="token operator">:</span>
      <span class="token operator">-</span> name<span class="token operator">:</span> overcommit
      <span class="token operator">-</span> name<span class="token operator">:</span> drf
      <span class="token operator">-</span> name<span class="token operator">:</span> predicates
      <span class="token operator">-</span> name<span class="token operator">:</span> proportion
      <span class="token operator">-</span> name<span class="token operator">:</span> nodeorder
      <span class="token operator">-</span> name<span class="token operator">:</span> binpack

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/rescheduling.md

Read More  Introducing Swift Async Algorithms

Issue:

https://github.com/volcano-sh/volcano/issues/1777

MPI plugin

You can use Volcano Jobs to run MPI jobs. Volcano Job build-in plugins such as svc, env, and ssh automatically configure password-free communications and environment variable injection for the masters and workers of MPI jobs.

The new version of Volcano further eases your running of MPI jobs by providing the MPI plugin. No more worries about the shell syntax, the communications between masters and workers, or manual SSH authentication. You can start an MPI job in a simple and graceful manner.

Example configuration:

apiVersion<span class="token operator">:</span> batch<span class="token punctuation">.</span>volcano<span class="token punctuation">.</span>sh<span class="token operator">/</span>v1alpha1
kind<span class="token operator">:</span> Job
metadata<span class="token operator">:</span>
  name<span class="token operator">:</span> lm<span class="token operator">-</span>mpi<span class="token operator">-</span>job
spec<span class="token operator">:</span>
  minAvailable<span class="token operator">:</span> <span class="token number">1</span>
  schedulerName<span class="token operator">:</span> volcano
  plugins<span class="token operator">:</span>
    mpi<span class="token operator">:</span> <span class="token punctuation">[</span><span class="token string">"--master=mpimaster"</span><span class="token punctuation">,</span><span class="token string">"--worker=mpiworker"</span><span class="token punctuation">,</span><span class="token string">"--port=22"</span><span class="token punctuation">]</span>  ## MPI plugin <span class="token keyword">register</span>
  tasks<span class="token operator">:</span>
    <span class="token operator">-</span> replicas<span class="token operator">:</span> <span class="token number">1</span>
      name<span class="token operator">:</span> mpimaster
      policies<span class="token operator">:</span>
        <span class="token operator">-</span> event<span class="token operator">:</span> TaskCompleted
          action<span class="token operator">:</span> CompleteJob
      <span class="token keyword">template</span><span class="token operator">:</span>
        spec<span class="token operator">:</span>
          containers<span class="token operator">:</span>
            <span class="token operator">-</span> command<span class="token operator">:</span>
                <span class="token operator">-</span> <span class="token operator">/</span>bin<span class="token operator">/</span>sh
                <span class="token operator">-</span> <span class="token operator">-</span>c
                <span class="token operator">-</span> <span class="token operator">|</span>
                  mkdir <span class="token operator">-</span>p <span class="token operator">/</span>var<span class="token operator">/</span>run<span class="token operator">/</span>sshd<span class="token punctuation">;</span> <span class="token operator">/</span>usr<span class="token operator">/</span>sbin<span class="token operator">/</span>sshd<span class="token punctuation">;</span>
                  mpiexec <span class="token operator">--</span>allow<span class="token operator">-</span>run<span class="token operator">-</span>as<span class="token operator">-</span>root <span class="token operator">--</span>host $<span class="token punctuation">{</span>MPI_HOST<span class="token punctuation">}</span> <span class="token operator">-</span>np <span class="token number">2</span> mpi_hello_world<span class="token punctuation">;</span>
              image<span class="token operator">:</span> volcanosh<span class="token operator">/</span>example<span class="token operator">-</span>mpi<span class="token operator">:</span><span class="token number">0.0</span><span class="token punctuation">.</span><span class="token number">1</span>
              name<span class="token operator">:</span> mpimaster
              workingDir<span class="token operator">:</span> <span class="token operator">/</span>home
          restartPolicy<span class="token operator">:</span> OnFailure
    <span class="token operator">-</span> replicas<span class="token operator">:</span> <span class="token number">2</span>
      name<span class="token operator">:</span> mpiworker
      <span class="token keyword">template</span><span class="token operator">:</span>
        spec<span class="token operator">:</span>
          containers<span class="token operator">:</span>
            <span class="token operator">-</span> command<span class="token operator">:</span>
                <span class="token operator">-</span> <span class="token operator">/</span>bin<span class="token operator">/</span>sh
                <span class="token operator">-</span> <span class="token operator">-</span>c
                <span class="token operator">-</span> <span class="token operator">|</span>
                  mkdir <span class="token operator">-</span>p <span class="token operator">/</span>var<span class="token operator">/</span>run<span class="token operator">/</span>sshd<span class="token punctuation">;</span> <span class="token operator">/</span>usr<span class="token operator">/</span>sbin<span class="token operator">/</span>sshd <span class="token operator">-</span>D<span class="token punctuation">;</span>
              image<span class="token operator">:</span> volcanosh<span class="token operator">/</span>example<span class="token operator">-</span>mpi<span class="token operator">:</span><span class="token number">0.0</span><span class="token punctuation">.</span><span class="token number">1</span>
              name<span class="token operator">:</span> mpiworker
              workingDir<span class="token operator">:</span> <span class="token operator">/</span>home
          restartPolicy<span class="token operator">:</span> OnFailure

Documentation: https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md

Read More  Decoding The Self-Healing Kubernetes: Step By Step

Issue:

https://github.com/volcano-sh/volcano/pull/2194

Links:

Release note: https://github.com/volcano-sh/volcano/releases/tag/v1.6.0

Branch: https://github.com/volcano-sh/volcano/tree/release-1.6

About Volcano

Website: https://volcano.sh

Github: https://github.com/volcano-sh/volcano

Volcano is designed for high-performance batch computing such as AI, big data, gene sequencing, and rendering jobs. The project has got more than 2400 Stars and 550 Forks on GitHub. 26,000 developers around the world join the community. Contributing enterprises include Huawei, AWS, Baidu, Tencent, JD.com, and Xiaohongshu.

Volcano supports mainstream computing frameworks, including Spark, Flink, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, and KubeGene. A comprehensive, robust ecosystem has been developed.

 

 

Project post by Volcano project maintainers
Source CNCF


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • CNCF
  • CNCF Volcano 1.6.0
  • Volcano
  • Volcano 1.6.0
You May Also Like
View Post
  • Cloud-Native
  • Multi-Cloud

Oracle Expands Multicloud Capabilities with AWS, Google Cloud, and Microsoft Azure

  • September 11, 2024
Cloud computing concept image double exposure Digitally Enhanced Smart City Concept with Cloud Computing
View Post
  • Cloud-Native
  • Computing
  • Hybrid Cloud
  • Multi-Cloud
  • Public Cloud

Make Your Business Resilient By Integrating These Best Practices Into Your Cloud Architecture

  • July 29, 2024
Huawei Cloud Cairo Region Goes Live
View Post
  • Cloud-Native
  • Computing
  • Platforms

Huawei Cloud Goes Live in Egypt

  • May 24, 2024
View Post
  • Cloud-Native
  • Computing
  • Engineering

10 Cloud Development Gotchas To Watch Out For

  • March 29, 2024
Storage Ceph
View Post
  • Cloud-Native
  • Data

The Growth Of IBM Storage Ceph – The Ideal Foundation For A Modern Data Lakehouse

  • January 30, 2024
Clouds
View Post
  • Cloud-Native
  • Platforms
  • Software Engineering

Microsoft Releases Azure Migrate Assessment Tool For .NET Application

  • January 14, 2024
View Post
  • Cloud-Native
  • Engineering
  • Platforms

Top Highlights From AWS Worldwide Public Sector Partners At Re:Invent 2023

  • December 27, 2023
View Post
  • Cloud-Native
  • Computing

Supercharging IBM’s Cloud-Native AI Supercomputer

  • December 24, 2023

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • oracle-ibm 3
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • 4
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 5
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 6
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 7
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 8
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 9
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 10
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
  • 2
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 3
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 4
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 5
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.