aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Public Cloud

Simplify Ceating Data Pipelines For Media With Spotify’s Klio

  • aster.cloud
  • December 3, 2020
  • 4 minute read

On any given day, music streaming service Spotify might process an audio file a hundred different ways—identifying a track’s rhythm and tempo, timestamping beats, and measuring loudness—as well as more sophisticated processing, such as detecting languages and separating vocals from instruments. This might be done to develop a new feature, to help inform playlists and recommendations, or for pure research.

Doing this kind of processing on a single audio file is one thing. But Spotify’s music library is over 60 million songs, growing by 40,000 tracks a day, not including the rapidly expanding podcast catalog. Then, factor in that hundreds of product teams are processing these tracks at the same time, all around the world, and for different use cases. This scale and complexity—plus, the difficulty of handling large binary files to begin with—can hinder collaboration and efficiency, bringing product development to a grinding halt. That’s unless you have Klio.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

 

What is Klio?

In order to productionize audio processing, Spotify created Klio—a framework built on top of Apache Beam for Python that helps researchers and engineers alike run large-scale data pipelines for processing audio and other media files (such as video and images). Spotify originally created Klio after realizing that ML and audio researchers across the company were performing similar audio processing tasks, but were struggling to deploy and maintain them. Spotify saw an opportunity to produce a flexible, managed process that would support a variety of audio processing use cases over time—efficiently and at scale—and got to work.

At a high level, Klio allows a user to provide a media file as input, perform the necessary processing, and output intelligent features and data. There are a multitude of possible use cases for audio alone, from standardizing common audio-processing tasks with ffmpeg or librosa to running custom machine learning models.

Read More  DevOps On Google Cloud: Tools To Speed Up Software Development Velocity

Klio simplifies and standardizes pipeline creation for these tasks, increasing efficiency and letting users focus on their business objectives rather than maintaining the processing infrastructure. Now that Klio has been released as open source, anyone can use the framework to build their own scalable and efficient media processing workflows.

 

How does Klio work?

Klio job overview.jpg

Klio currently enables a few key steps to create the desired pipeline. First, it assumes that the pipeline will accept a large binary file as input. This can be audio, images, or video. This file is stored in Cloud Storage. As part of this, the job sends a unique message to Pub/Sub, where it announces that a file has been uploaded. Klio then reads this message and downloads the file to begin processing. At this step, Klio can begin performing the necessary logic to intelligently process the desired outcome for the particular use case, such as language extraction. Once the processing is complete, it uploads its output artifact to another Cloud Storage bucket for storage. The overall orchestration of the whole pipeline is done by Apache Beam, which allows for a traditional Python interface for audio/ML users and traditional pipeline execution.

One of Klio’s key benefits is its support for directed acyclic graphs (DAGs), which allow users to configure dependent jobs and their order of execution so that a parent job can trigger corresponding children jobs.

directed acyclic graphs.jpg

In this example, there are three teams all relying on the same overall parent job, called Downsample. This downsampling adjusts the number of samples in an audio file to essentially compress the file to a specified rate that may be required for later jobs. As a result, now Team A, B, and C’s jobs may begin to launch their needed processing. This might be detecting the “speechiness” or amount of spoken word, “instrumentalness” or the lack of vocals, and much more.

Read More  Improve Your Security Posture With New Overly Permissive Firewall Rule Insights

Another key feature of Klio is its ability to optimize the order of execution. It’s not always efficient or necessary to run every Klio job in the graph for a given file. Maybe you want to iterate on your own job without triggering sibling or downstream jobs. Or you have a subset of your media catalogue that requires some backfill processing. Sometimes this means running the parent Klio jobs to fill in missing dependencies. With that, Klio supports bottom-up processing when needed, like this:

optimize the order of execution.gif

A Klio job will first check to see if work has already been processed for a given file. If so, work is skipped for that job. However, if the job’s input data is not available (i.e., if the Energy job does not have the output from the Beat Tracking job for a given audio), Klio will recursively trigger jobs within its direct line of execution without triggering work for sibling jobs.

 

What’s next for Klio?

This initial release of Klio represents two years of building, testing, and practical application by different teams all across Spotify. From the beginning, Klio was made with open source in mind.

With this overall architecture, users are free to add in their particular customizations as needed to cater to their requirements. Klio is cloud-agnostic, meaning that it can support a variety of runners, both locally and in the cloud. In Spotify’s case, this meant Google Cloud, using Apache Beam to call the Dataflow Runner. But it can be extended to other runners as well. If you’re interested in contributing back, they welcome more collaborations with the open source community.

Read More  Alibaba Cloud Brings New Innovative Solutions To Accelerate MENA Digitalization

While Klio was initially built for audio, it is capable of serving all types of media. At Spotify, they’ve already seen success in a variety of different internal use cases. Specifically, it separates the vocals and instruments to enable Sing Along functionality in Japan as well as fingerprints common audio attributes, such as “danceability” and  “tempo,” in their Audio Features API. Based on the early success from these use cases, it will be exciting to see what other media processing problems Klio can help solve, whether it is enabling large-scale content moderation or performing object detection across large video streams.

How to get started

To learn more, read the rest of the Klio story on the Spotify Engineering blog. Or jump in and get started with Klio now.

By Kaitlin Ardiff, Strategic Cloud Engineer | Lynn Root Staff Engineer, Spotify

Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Google Cloud
  • Klio
  • Spotify
You May Also Like
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
DeepSeek R1 is now available on Azure AI Foundry and GitHub
View Post
  • Public Cloud
  • Technology

DeepSeek R1 is now available on Azure AI Foundry and GitHub

  • February 2, 2025
Cloud platforms among the clouds
View Post
  • Computing
  • Learning
  • Public Cloud

Best Cloud Platforms Offering Free Trials for Cloud Mastery

  • December 23, 2024
Vehicle Manufacturing
View Post
  • Hybrid Cloud
  • Public Cloud

Toyota shifts into overdrive: Developing an AI platform for enhanced manufacturing efficiency

  • December 10, 2024
IBM and AWS
View Post
  • Public Cloud

IBM and AWS Accelerate Partnership to Scale Responsible Generative AI

  • December 2, 2024
COP29 AI and Climate Change
View Post
  • Public Cloud
  • Technology

How Cloud And AI Are Bringing Scale To Corporate Climate Mitigation And Adaptation

  • November 18, 2024
Cloud Workstations
View Post
  • Public Cloud

FEDRAMP High Development in the Cloud: Code with Cloud Workstations

  • November 8, 2024
View Post
  • Public Cloud

PyTorch/XLA 2.5: vLLM support and an improved developer experience

  • October 31, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.