aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Programming
  • Public Cloud

Debunking Myths About Python On Dataflow

  • aster.cloud
  • November 19, 2021
  • 4 minute read

For many developers that come to Dataflow, Google Cloud’s fully managed data processing service, the first decision they have to make is which programming language to use. Dataflow developers use the open-source Apache Beam SDK to author their pipelines, and have several choices for language to use: Java, Python, Go, SQL, Scala, and Kotlin. In this post, we’ll focus on one of our fastest growing languages: Python.

The Python SDK for Apache Beam was introduced shortly after Dataflow’s general availability was announced in 2015, and is the primary choice for several of our largest customers. However, it has suffered from a reputation for being incomplete and inferior to its predecessor, the Java SDK. While historically there was some truth to this perception, Python’s feature set has caught up to the Java SDK and offers new capabilities that are specifically catered to Python developers. We’ll take the rest of this blog to inspect some popular myths, and conclude with a brief review of the latest & greatest for the Python SDK.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Myth 1: Python doesn’t support streaming pipelines.

BUSTED. Streaming support for Python has been available for more than two years, released as part of Beam 2.16 in October 2019. This means all of the unique capabilities of streaming Dataflow, including Streaming Engine, update, drain, and snapshots are all available for Python users.

Myth 2: SqlTransform isn’t available for Python.

BUSTED. Tired of writing tedious code to join together datastreams? Use SqlTransform for Python. Apache Beam introduced support for SqlTransform to the Python SDK last year as part of our advancements with multi-language pipelines (more on that later). Take a look at this example to get started.

Read More  Flux Project Update – July 2021

Myth 3: State and Timer APIs aren’t available in Python.

BUSTED. Two of the most powerful features of the Beam SDK are the State and Timer APIs, which allow for more fine-grained control over aggregations than windows and triggers do. These APIs are available in Python, and offer parity with the Java SDK for the most common use cases. Reference the Beam programming guide for some examples of what you can do with these APIs.

Myth 4: There is support for a limited set of I/O’s in Python.

BUSTED. The most glaring disparity between the Java and Python SDKs is the discrepancy between I/O connectors, which facilitate read & write operations for Dataflow pipelines. Our support for multi-language pipelines puts this myth to rest. With cross-language transformations, Python pipelines can invoke a Java transformation underneath the hood to provide access to the entire library of Java-based I/O connectors. In fact, that’s how we implemented the KafkaIO module for Python (see example). Developers can invoke their own cross-language transformations using the instructions in the patterns library.

Myth 5: There are fewer code samples in Python.

PLAUSIBLE: Apache Beam maintains several repos for Python examples: one for snippets, one for notebook samples, and one for complete examples. However, there are a couple of notable exceptions where Python is missing, namely our Dataflow Templates repository. This is attributable to the fact that most of Dataflow’s initial users were Java developers. But this quick observation ignores two key factors: 1) the unique assets that are only available for Python developers, and 2) the tremendous momentum behind the Beam Python community.

Read More  Vertex Forecast: An Overview

Python developers love writing exploratory code in JupyterLab notebook environments. Beam offers an interactive module that allows you to interactively build and run your Beam Python pipelines in a Jupyter Notebook. We make deploying these notebooks really simple with Beam Notebooks, which spins up a managed Notebook that contains all the required Beam libraries to prototype your pipelines. We also have a number of helpful examples & tutorials that show how you can sample data from a streaming source, or attach GPUs to your notebooks to accelerate your processing. The notebook also contains a learning track for new Beam developers that cover everything from basic operations, aggregations, and streaming concepts. You can review the documentation here.

Over the past few years, we have seen a number of extensions built on top of the Beam Python SDK. Cruise Automation published the Terra library, which enables 70+ Cruise data scientists to submit jobs without having to understand the underlying infrastructure. Spotify open-sourced Klio, a framework built on top of Beam Python that simplifies common tasks required for processing audio and media files. I have even pointed customers to beam-nuggets, a lightweight collection of Beam Python transformations used for reading/writing from/to relational databases. Open-source developers and large organizations are doubling down on Beam Python, and these brief examples underscore that trend.

What’s new:

The Dataflow team has a slew of new capabilities that will help Python developers advance their use case. Here’s a quick run-down of the newest features:

  • Custom containers: Users can now specify their own container image when they launch their Dataflow job. This is a common ask from our Python audience, who like to package their pipeline code with their own libraries and dependencies. We’re excited to announce that this feature is generally available—take a look at the documentation so you can try for yourself!
  • GPUs: Dataflow recently announced the general availability of GPU support on Dataflow. You can now accelerate your data processing by provisioning GPUs on your Dataflow job, another common request from machine learning practitioners on Dataflow. You can review the details of the launch here.
  • Beam DataFrames: Beam DataFrames brings the magic of pandas to the Beam SDK, allowing developers to convert PCollections to DataFrames and use the standard methods available with the pandas DataFrame API. DataFrames gives developers a more natural way to interact with their datasets and create their pipelines, and will be a stepping stone to future efficiency improvements. Beam DataFrames are generally available starting Beam 2.32, which was released in August. Learn more about Beam DataFrames here.
Read More  Go 1.20 Is Released!

We invite you to try out our new features using Beam Notebooks today!

Do you have an interesting idea that you want to share with our Beam community? You can reach out to us through various modes, all found here. We are excited to see what’s next for Beam Python.

 

 

By Mehran Nazir Product Manager, Dataflow
Source Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Dataflow
  • Google Cloud
  • Python
You May Also Like
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
DeepSeek R1 is now available on Azure AI Foundry and GitHub
View Post
  • Public Cloud
  • Technology

DeepSeek R1 is now available on Azure AI Foundry and GitHub

  • February 2, 2025
Cloud platforms among the clouds
View Post
  • Computing
  • Learning
  • Public Cloud

Best Cloud Platforms Offering Free Trials for Cloud Mastery

  • December 23, 2024
Vehicle Manufacturing
View Post
  • Hybrid Cloud
  • Public Cloud

Toyota shifts into overdrive: Developing an AI platform for enhanced manufacturing efficiency

  • December 10, 2024
IBM and AWS
View Post
  • Public Cloud

IBM and AWS Accelerate Partnership to Scale Responsible Generative AI

  • December 2, 2024
COP29 AI and Climate Change
View Post
  • Public Cloud
  • Technology

How Cloud And AI Are Bringing Scale To Corporate Climate Mitigation And Adaptation

  • November 18, 2024
Cloud Workstations
View Post
  • Public Cloud

FEDRAMP High Development in the Cloud: Code with Cloud Workstations

  • November 8, 2024
View Post
  • Public Cloud

PyTorch/XLA 2.5: vLLM support and an improved developer experience

  • October 31, 2024

Stay Connected!
LATEST
  • 1
    Advanced audio dialog and generation with Gemini 2.5
    • June 15, 2025
  • 2
    A Father’s Day Gift for Every Pop and Papa
    • June 13, 2025
  • 3
    Global cloud spending might be booming, but AWS is trailing Microsoft and Google
    • June 13, 2025
  • Google Cloud, Cloudflare struck by widespread outages
    • June 12, 2025
  • What is PC as a service (PCaaS)?
    • June 12, 2025
  • 6
    Apple services deliver powerful features and intelligent updates to users this autumn
    • June 11, 2025
  • By the numbers: Use AI to fill the IT skills gap
    • June 11, 2025
  • 8
    Crayon targets mid-market gains with expanded Google Cloud partnership
    • June 10, 2025
  • Apple-WWDC25-Apple-Intelligence-hero-250609 9
    Apple Intelligence gets even more powerful with new capabilities across Apple devices
    • June 9, 2025
  • Apple-WWDC25-Liquid-Glass-hero-250609_big.jpg.large_2x 10
    Apple introduces a delightful and elegant new software design
    • June 9, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design
    • June 9, 2025
  • Robot giving light bulb to businessman. Man sitting with laptop on money coins flat vector illustration. Finance, help of artificial intelligence concept for banner, website design or landing web page 2
    FinOps X 2025: IT cost management evolves for AI, cloud
    • June 9, 2025
  • 3
    AI security and compliance concerns are driving a private cloud boom
    • June 9, 2025
  • 4
    It’s time to stop debating whether AI is genuinely intelligent and focus on making it work for society
    • June 8, 2025
  • person-working-html-computer 5
    8 benefits of AI as a service
    • June 6, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.