aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering

Learn How To Stream JSON Data Into BigQuery Using The New BigQuery Storage Write API

  • aster.cloud
  • March 24, 2022
  • 4 minute read

The Google BigQuery Write API offers high-performance batching and streaming in one unified API. The previous post in this series introduced the BigQuery Write API. In this post, we’ll show how to stream JSON data to BigQuery by using the Java client library.

The Write API expects binary data in protocol buffer format. This makes the API very efficient for high-throughput streaming. However, protocol buffers can also be somewhat difficult to work with. For many applications, JSON data is a more convenient data format. The BigQuery client library for Java provides the best of both worlds, through the JsonStreamWriter. The JsonStreamWriter accepts data in the form of JSON records, and automatically converts the JSON objects into binary protocol buffers before sending them over the wire.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Let’s see how it works.


The scenario

Let’s consider a scenario where you are streaming GitHub commit data to BigQuery. You can use this data to get real-time insights about the commit activity.

For the purpose of the example, we’ll read the data from a local file. However, you can imagine an application that receives this data in the form of events or streamed from a log file. In fact, one advantage of the Write API is that it doesn’t matter where the data comes from, as long as you can get it into a compatible format.

Each line in the source file has the following structure:

 

{
    "commit":"commit ID",
    "parent":["parent commit ID"],
    "author":"author",
    "committer":"committer",
    "commit_date":"YYYY-MM-DDTHH:MM:SS"
    "commit_msg": {
        "subject":"Commit subject",
        "message":"Commit message", 
    }
    "repo_name":"Repo name"
}

 

Create the destination table

 

First, we need to create a table in BigQuery to receive the streamed data. There are several ways to create a table in BigQuery, but one of the easiest is by running a CREATE TABLE query:

Read More  Cloud SQL - SQL Server Performance Analysis And Query Tuning

 

CREATE TABLE `myproject.mydataset.github`
(
  commit STRING,
  parent ARRAY<STRING>,
  author STRING,
  committer STRING,
  commit_date DATETIME,
  commit_msg STRUCT<subject STRING, message STRING>,
  repo_name STRING
);

 

Stream data to the table

Now that we have a table, we can write data to it.

The Write API supports several modes, including committed mode for streaming applications that require exactly-once delivery, and pending mode for batch writes with stream-level transactions. For this example, we’ll use the Write API’s default stream. The default stream is suitable for streaming applications that require at-least-once semantics but don’t need the stronger exactly-one guarantee. Git commits have a unique commit ID, so you can identify duplication in the destination table if needed.

Start by initializing the JsonStreamWriter, passing in the name of the destination table and the table schema:

 

BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

    // Get the schema of the destination table and convert to the equivalent BigQueryStorage type.
    Table table = bigquery.getTable(datasetName, tableName);
    Schema schema = table.getDefinition().getSchema();
    TableSchema tableSchema = BqToBqStorageSchemaConverter.convertTableSchema(schema);

    // Use the JSON stream writer to send records in JSON format.
    TableName parentTable = TableName.of(projectId, datasetName, tableName);
    try (JsonStreamWriter writer =
        JsonStreamWriter.newBuilder(parentTable.toString(), tableSchema).build()) {

 

Now we’re ready to read the data file and send the data to the Write API. As a best practice, you should send the data in batches rather than one row at a time. Read each JSON record into a JSONObject and collect a batch of them into a JSONArray:

 

BufferedReader reader = new BufferedReader(new FileReader(dataFile));
      String line = reader.readLine();
      while (line != null) {
        JSONArray jsonArr = new JSONArray();
        for (int i = 0; i < 100; i++) {
          JSONObject record = new JSONObject(line);
          jsonArr.put(record);
          line = reader.readLine();
          if (line == null) {
            break;
          }
        } // batch

 

Read More  Enhance Your Analysis With New International Google Trends Datasets In BigQuery

To write each batch of records, call the JsonStreamWriter.append method. This method is asynchronous and returns an ApiFuture. For best performance, don’t block waiting for the future to complete. Instead, continue to call append and handle the result asynchronously.

 

ApiFuture<AppendRowsResponse> future = writer.append(jsonArr);
        ApiFutures.addCallback(
            future, new AppendCompleteCallback(), MoreExecutors.directExecutor());

 

This example registers a completion callback. Inside the callback, you can check whether the append succeeded:

 

class AppendCompleteCallback implements ApiFutureCallback<AppendRowsResponse> {

  private static int batchCount = 0;
  private static final Object lock = new Object();

  public void onSuccess(AppendRowsResponse response) {
    synchronized (lock) {
      if (response.hasError()) {
        System.out.format("Error: %s\n", response.getError().toString());
      } else {
        ++batchCount;
        System.out.format("Wrote batch %d\n", batchCount);
      }
    }
  }

  public void onFailure(Throwable throwable) {
    System.out.format("Error: %s\n", throwable.toString());
  }
}

 

Handling table schema updates

BigQuery lets you modify the schema of an existing table in certain constrained ways. For example, you might decide to add a field named email that contains the commit author’s email. Data in the original schema will have a NULL value for this field.

If the schema changes while you are streaming data, the Java client library automatically reconnects with the updated schema. In our example scenario, the application passes the data directly to the Write API without any intermediate processing. As long as the schema change is backward compatible, the application can continue streaming without interruption. After the table schema is updated, you can start sending data with the new field.

Note: Schema updates aren’t immediately visible to the client library, but are detected on the order of minutes.

To check for schema changes programmatically, call AppendRowsResponse.hasUpdatedSchema after the append method completes. For more information, see Working with Schemas in the Write API documentation.

Read More  Support For 100 Large-Scale Clusters

Query the data

Our data is available for analysis as soon as it’s ingested into BigQuery. For example, we can now run a query to find out which days of the week have the most commits:

 

SELECT
  EXTRACT(DAYOFWEEK FROM ts) AS day_of_week,
  FORMAT_DATETIME("%A",commit_date) AS day,
  COUNT(*) AS count
FROM `mydataset.github`
GROUP BY day_of_week, day
ORDER BY count DESC;

+-------------+-----------+-------+
| day_of_week |    day    | count |
+-------------+-----------+-------+
|           3 | Tuesday   |   283 |
|           4 | Wednesday |   281 |
|           5 | Thursday  |   263 |
|           2 | Monday    |   201 |
|           6 | Friday    |   201 |
|           7 | Saturday  |    39 |
|           1 | Sunday    |    31 |
+-------------+-----------+-------+

 

It turns out that the busiest day is Tuesday. Not surprisingly, Saturday and Sunday have the fewest commits.

 

Conclusion

In this article, you learned how the Java client library makes it easy to stream JSON data into BigQuery. You can view the complete source code on GitHub. For more information about the Write API, including how to use committed and pending modes, see the BigQuery Storage Write API documentation.

 

 

By: Veronica Wasson (Technical Writer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • API
  • BigQuery;
  • Google Cloud
  • JSON
  • SQL
You May Also Like
View Post
  • Engineering
  • Technology

Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design

  • June 9, 2025
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024

Stay Connected!
LATEST
  • aster-cloud-erp-bill_of_materials_2 1
    What is an SBOM (software bill of materials)?
    • July 2, 2025
  • aster-cloud-sms-pexels-tim-samuel-6697306 2
    Send SMS texts with Amazon’s SNS simple notification service
    • July 1, 2025
  • Camping 3
    The Summer Adventures : Camping Essentials
    • June 27, 2025
  • aster-cloud-website-pexels-goumbik-574069 4
    Host a static website on AWS with Amazon S3 and Route 53
    • June 27, 2025
  • Prioritize security from the edge to the cloud
    • June 25, 2025
  • 6 edge monitoring best practices in the cloud
    • June 25, 2025
  • Genome 7
    AlphaGenome: AI for better understanding the genome
    • June 25, 2025
  • 8
    Pure Accelerate 2025: All the news and updates live from Las Vegas
    • June 18, 2025
  • 9
    ‘This was a very purposeful strategy’: Pure Storage unveils Enterprise Data Cloud in bid to unify data storage, management
    • June 18, 2025
  • What is cloud bursting?
    • June 18, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    There’s a ‘cloud reset’ underway, and VMware Cloud Foundation 9.0 is a chance for Broadcom to pounce on it
    • June 17, 2025
  • Oracle adds xAI Grok models to OCI
    • June 17, 2025
  • What is confidential computing?
    • June 17, 2025
  • Fine-tune your storage-as-a-service approach
    • June 16, 2025
  • 5
    Advanced audio dialog and generation with Gemini 2.5
    • June 15, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.