aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering
  • Work & Jobs

Simplify Data Processing And Data Science Jobs With Serverless Spark, Now Available On Google Cloud

  • aster.cloud
  • February 13, 2022
  • 3 minute read

Most businesses today choose Apache Spark for their data engineering, data exploration and machine learning use cases because of its speed, simplicity and programming language flexibility. However, managing clusters and tuning infrastructure have been highly inefficient, and a lack of an integrated experience for different use cases are draining productivity gains, opening up governance risks and reducing the potential value that businesses could achieve with Spark.

Today, we are announcing the general availability of Serverless Spark, industry’s first autoscaling serverless Spark. We are also announcing the private preview of Spark through BigQuery, empowering BigQuery users to use serverless Spark for their data analytics, along with BigQuery SQL. With these, you can effortlessly power ETL, data science, and data analytics use cases at scale.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Dataproc Serverless for Spark (GA)

Per IDC, developers spend 40% time writing code, and 60% of the time tuning infrastructure and managing clusters. Furthermore, not all Spark developers are infrastructure experts, resulting in higher costs and productivity impact.

Serverless Spark, now in GA, solves for these by providing the following benefits:

  • Developers can focus on code and logic. They do not need to manage clusters or tune infrastructure. They submit Spark jobs from their interface of choice, and processing is auto-scaled to match the needs of the job.
  • Data engineering teams do not need to manage and monitor infrastructure for their end users. They are freed up to work on higher value data engineering functions.
  • Pay only for the job duration, vs paying for infrastructure time.

How OpenX is using Serverless Spark: From heavy cluster management to Serverless

OpenX, the largest independent ad exchange network for publishers and demand partners, was in the process of migrating an old MapReduce pipeline. The previous MapReduce cluster was shared among multiple jobs, which required frequent cluster size updates, autoscaler management and also, from time to time, cluster recreation due to upgrades as well as other unrecoverable reasons.

Read More  Introducing Active Assist Recommendations For BigQuery Capacity Planning

OpenX used Google Cloud’s serverless Spark to abstract away all the cluster resources and just focus on the job itself. This significantly helped to boost the team’s productivity, while reducing infrastructure costs.  It turned out that for OpenX, serverless Spark is cheaper from the resource perspective, not to mention maintenance costs of a cluster lifecycle management.

During implementation, OpenX first evaluated the existing pipeline characteristics. It’s hourly batch workload with input size varies from 1.7 TB to 2.8 TB compressed Avro per hour. The pipelines were run on the shared cluster along with other jobs and do not require data shuffling.

 

To set up the flow, Airflow Kubernetes worker is assigned to a workload identity, which submits a job to the Dataproc serverless batch service using Airflow’s plugin DataprocCreateBatchOperator. The batch job itself is assigned to a tailored service account and Spark application jar is released to the Google Cloud Storage using gradle plugin as Dataproc batches can load a jar directly from there. Another job dependency (Spark-Avro) is loaded from the driver or worker filesystem. The batch job is then started with the initial size of 100 executors (1vCPU + 4GB RAM) and autoscaler adds resources as needed, as the hourly data sizes vary during the day.

 

Serverless Spark through BigQuery (Preview)

We announced at NEXT that BigQuery is adding a unified interface for data analysts to write SQL or PySpark. That Preview is now live, and you can request access through the signup form.

You can write PySpark code in the BigQuery editor, and the code is executed using serverless Spark seamlessly, without the need for infrastructure provisioning. BigQuery has been the pioneer for serverless data warehousing, and now supports serverless Spark for Spark-based analytics.

Read More  An Update On Google Cloud’s Commitments To E.U. Businesses In Light Of The New E.U.-U.S. Data Transfer Framework

 

What’s next?

We are working on integrating serverless Spark with the interfaces different users use, for enabling Spark without any upfront infrastructure provisioning. Watch for the availability of serverless Spark through Vertex AI workbench for data scientists, and Dataplex for data analysts, in the coming months. You can use the same signup form to express interest.

Getting started with Serverless Spark and Spark through BigQuery today

You can try serverless Spark using this tutorial, or use one of templates for common ETL tasks. To sign up for Spark through BigQuery preview, please use this form.

 

By: Abhishek Kashyap (Sr. Product Manager, Google Cloud) and Marek Wolczanski (Data Platform Engineer, OpenX)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Data Analytics
  • Dataproc
  • Google Cloud
  • OpenX
  • Serverless Spark
You May Also Like
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024

Stay Connected!
LATEST
  • 1
    Just make it scale: An Aurora DSQL story
    • May 29, 2025
  • 2
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • Examine the 4 types of edge computing, with examples
    • May 28, 2025
  • AI and private cloud: 2 lessons from Dell Tech World 2025
    • May 28, 2025
  • 5
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • Weigh these 6 enterprise advantages of storage as a service
    • May 28, 2025
  • 7
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 8
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 9
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
  • Gemma 3n 10
    Announcing Gemma 3n preview: powerful, efficient, mobile-first AI
    • May 22, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Understand how Windows Server 2025 PAYG licensing works
    • May 20, 2025
  • By the numbers: How upskilling fills the IT skills gap
    • May 21, 2025
  • 3
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • 4
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • 5
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.