aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering

BigQuery Write API Explained: An Overview Of The Write API

  • aster.cloud
  • February 10, 2022
  • 4 minute read

Google BigQuery Write API was released to general availability in 2021 and is BigQuery’s preferred data ingestion path which offers high-performance batching and streaming in one unified API. Since its inception, numerous features and improvements have been made to improve performance and usability, making it easier for users to directly ingest data into BigQuery. Some exciting capabilities on top of the unification aspect include:

  • Ingesting data directly to BigQuery without having to stage it in Google Cloud Storage which simplifies your workflows.
  • Stream processing data and immediately reading it, which enables you to build low-latency, fast-response data applications.
  • Guaranteeing exactly-once delivery, which ensures that you don’t have to write custom deduplication logic.
  • Supporting row batch-level transaction, which allows for safe retries and schema update detection.

In this first post, we will delve into these new features and explore how the BigQuery Write API compares to the other existing data ingestion methods in BigQuery, and how you can quickly get started with it.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Ingesting data into BigQuery

There are several ways to ingest data into BigQuery’s managed storage. The specific ingestion method depends on your workload. Generally, for one-time load jobs and recurring batch jobs, where batch latency is not a concern, you can use the BigQuery Data Transfer Service or BigQuery Load Jobs. Otherwise, the BigQuery Write API is the recommended way to ingest data.

Click to enlarge

 

 

 

Before the BigQuery Write API, there were two ways to ingest data into BigQuery: via a BigQuery Load job or the legacy Streaming API.

Load

BigQuery Load jobs are primarily suited for batch-only workloads that ingest data from Google Cloud Storage into BigQuery. BigQuery Data Transfer Service uses Load under the hood but allows you to transfer data from more sources other than Google Cloud Storage and to run batch loads on a schedule. However, because Load is a batch mode insert, jobs can take a long time to run and ingestion performance is proportional to the allocated compute resources (see BigQuery Reservation). In other words, Load jobs don’t allow you to ingest data directly from your data source into BigQuery with low latency.

Read More  Introduction To Google Cloud's Operations Suite

Compared to BigQuery Load jobs, the BigQuery Write API provides:

  • Stream level transaction: One stream can only be committed once, which enables safe retries.
  • Simpler workflows: By writing directly to BigQuery storage, you can avoid exporting your data to Google Cloud Storage and then loading it into BigQuery.
  • SLO: The BigQuery Write API has the same level of SLO as other existing BigQuery APIs such as Query Jobs and the legacy Streaming API.

Legacy Streaming API

The legacy streaming API does provide you with the ability to ingest data in real-time with very low latency, but you are responsible for tracking insert status and if you retry, you may end up with duplicate records.

Compared to the legacy streaming API, the BigQuery Write API  provides:

  • Write idempotency: The legacy streaming API only supports best-effort deduplication for a small period of time (order of a few minutes). The BigQuery Write API, however, ensures that one append can only happen once at a given offset on the same stream, therefore guaranteeing write idempotency.
  • Higher throughput: The Write API has three times more default quota (3 GB/s) compared to the legacy streaming API (1 GB/s), allowing higher throughput in data ingestion. Additional quota can be provisioned upon request.
  • Lower cost: 50% lower per GB cost compared to the legacy streaming API.

In addition, since the Write API supports unified batch and streaming, you will no longer need to use separate APIs to handle all of your workloads at scale.

Unified batch and streaming API powered by the new streaming backend

The Write API is backed by a new streaming backend that can handle much larger throughput with better data reliability compared to the old backend. The new backend is an exabyte scale structured storage system behind BigQuery, built to support stream-based processing for scalable streaming analytics across all analytic engines in GCP. Unlike its predecessor that is optimized for batch mode processing, the new streaming backend treats streaming as a first-class workload and supports high throughput real-time streaming and processing. It is a stream-oriented storage that allows exactly-once semantics and immediate data availability for queries.

Read More  Join Us For A Show-And-Tell Edition Of Google Cloud Security Talks

How can you get started with the BigQuery Write API?

You can start using BigQuery Write API to stream data with high throughput by creating a StreamWriter and calling the append method on it. Here is an example of sending data in binary format over the wire in Java:

 

// Default Stream name
    String defaultStreamName = "projects/p/datasets/d/tables/t/streams/_default";
    try (StreamWriter streamWriter = StreamWriter.newBuilder(defaultStreamName).build()) {
      // Use the same Stream to write 10K Protobuf rows
      for (int i = 0; i < 100000; i++) {
        // Create Protobuf rows with example Protobuf type
        FooType fooType = FooType.newBuilder().setFoo(String.format("message %03d", i)).build();
        ProtoRows protoRows =
            ProtoRows.newBuilder().addSerializedRows(fooType.toByteString()).build();
        // Append the row
        streamWriter.append(protoRows);
      }
      FinalizeWriteStreamResponse finalizeResponse =
          bigqueryWriteClient.finalizeWriteStream(defaultStreamName);
      System.out.println("Rows written: " + finalizeResponse.getRowCount());
    } catch (IOException e) {
      System.out.println("Failed to append records. \n" + e.getMessage());
    }

    // Verify that data has been correctly ingested by calling the BigQuery query API
    TableResult result =
        bigquery.query(
            QueryJobConfiguration.newBuilder("SELECT * FROM `MY_PROJECT_NAME.MY_TABLE_NAME`")
                .build());
    result.iterateAll().forEach(rows -> rows.forEach(row -> System.out.println(row.getValue())));

 

If your data is not stored in Protobuf format, you can use the client library’s JsonStreamWriter (currently provided in Java) to directly stream JSON data into BigQuery. In addition, this example provides at-least-once(instead of exactly-once) semantics because it uses the default Stream. Stay tuned to see how to achieve exactly-once semantics with Dataflow and the client library in later posts.

What’s Next?

In this article, we reviewed where BigQuery Write API fits in the data ingestion paths to BigQuery, what makes the BigQuery Write API fast and cheap, and how to get started with the BigQuery Write API. In upcoming posts, we will look at how to easily incorporate the BigQuery Write API in your data pipeline, some new key concepts that are critical to using the API, and new features!

Read More  Quantum Metric Explores Retail Big Data Use Cases On BigQuery

Stay tuned. Thank you for reading! Have a question or want to chat? Find me on Twitter or LinkedIn.

Thanks to Gaurav Saxena, Yiru Tang, Pavan Edara, and Veronica Wasson for helping with the post.

 

 

By: Stephanie Wang (Developer Relations Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • API
  • BigQuery;
  • Development
  • Google Cloud
You May Also Like
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
View Post
  • Engineering

Transforming the Developer Experience for Every Engineering Role

  • July 14, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.