aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Multi-Cloud
  • Public Cloud

Hidden Gems Of Google BigQuery

  • aster.cloud
  • August 10, 2022
  • 4 minute read

BigQuery is amazing. It is one of my favorite tools within Google Cloud. Luckily, it looks like Google feels the same and, to the joy of BigQuery fans, keeps adding new features there.

No matter how long I have been working with BigQuery, there is always something new I discover once in a while. Today I want to share with you the following four things:


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

  • the AUTO column,
  • multi-statement transactions,
  • clustering, and
  • indexes.

These are great features that, in my opinion, are not paid enough attention to in documentation or talks.

AUTO column

This one is truly a “hidden gem”. You won’t find it in the official documentation (at least I didn’t manage to do so). It is only mentioned in Google Issue Tracker.

Let’s say you push some data into BigQuery, and then another system wants to run a scheduled job to process the newly arrived data. For example, a system can try to pull data from BigQuery to another storage, or this system needs to run hourly reports based on the data, etc. In each of those cases, you would prefer to avoid processing the same records multiple times . As a result, you need a way to know which records are already processed and which were added after the processing took place.

Unlike traditional OLTP DBs, BigQuery does not support auto-increment columns or columns that are automatically filled out with the current date-time. This is true unless you are using the Streaming API and the AUTO feature! In the latter case, you can add the TIMESTAMP column to the table schema and assign “AUTO” in your JSON payload. This allows BigQuery to auto-populate the mentioned column with the timestamp indicating when the record reached BigQuery.

Read More  User Survey Shows Cloud Foundry Substantially Reduces Time For Development And Deployment Of Cloud Apps

Multi-statement transaction

While not being an OLTP DB, BigQuery supports multi-statement transactions!

Technically speaking, when I am writing this post, multi-statement transactions are in preview, but I hope this feature will become GA soon.

There is probably no need to explain what transactions are and how they can be used. More details on multi-statement transactions can be found in the official documentation.

There are however a couple of interesting things you need to pay attention to:

  • If a transaction mutates (updates or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are canceled.
  • Transactions cannot use DDL statements that affect permanent entities.
  • The CURRENT_TIMESTAMP, CURRENT_DATE, and CURRENT_TIME functions return the timestamp of the transaction start time.
  • You cannot use the FOR SYSTEM_TIME AS OF clause to read a table beyond the timestamp of the transaction start time. Doing so returns an error.
  • Within a transaction, materialized views are interpreted as logical views. You can still query a materialized view inside a transaction, but it doesn’t result in any performance improvement or cost reduction compared with the equivalent logical view.

Clustering

When it comes to query optimization in BigQuery (which, in the case of BigQuery, is mostly about cost optimization) the first approach that comes to mind is table partitioning.

You may have also heard about table sharding. Let’s say a few words about sharding and partitioning then. You can find information about table partitioning in the official documentation. As for table sharding, you can find it… nowhere, except for the Partitioning versus sharding section of the partitioning documentation. This is a hint that sharding is a legacy feature and, by default, partitioning should be used.

Read More  Oracle Fusion Cloud Provides Real-Time Insights To One Of Nation’s Largest Energy Infrastructure Companies

Partitioning, however, is not the only option available. There is also clustering. While partitioning divides data into partitions (allowing the query to read only specific partitions when needed), clustering works by colocating related data and allowing queries to access particular segments of the data if a filter for clustering columns is used.

So, which one is better — partitioning or clustering?

  • First, there is the official documentation on this topic that gives you a lot of useful recommendations.
  • Second, please note that partitioning is only possible based on the Time-unit and Integer columns as well as based on the ingestion time.
    Clustering, in its turn, supports many other types like BOOL, GEOGRAPHY, STRING, etc. Additionally, clustering supports groups of up to 4 columns.
  • Finally, why not use both? You can apply both partitioning and clustering to the same table either to the same field or different fields.

It is important to note that clustering is not the same as indexes (the latter we will discuss in detail later). Applying clustering based on a specific field does not mean point lookups. It just means that records will be clustered together, and you will only be charged for querying a specific cluster.

How big are those clusters? Documentation does not specify it. If you use both partitioning and clustering based on the same field, then, of course, each cluster will not exceed the respective partition. However, if your partitions are small, then, most likely, each partition will only consist of one cluster. As a result, you won’t see any benefits from clustering.

Read More  Meet Our Data Champions: Emily Bobis, Driving Road Intelligence In Australia

What is the threshold for small/big partitions? Based on the documentation, it looks like this is around 1 GB. If your whole table (or each partition) is smaller than that, then,  probably, clustering won’t bring any benefits.

Indexes

This is another amazing feature that is currently in preview. Official documentation can be found here.

As you may guess from the name, it is designed for point lookups, but not over any field. Currently, indexes can be used to easily find unique data elements that are buried in unstructured text or semi-structured JSON data.

Indexes are only used when the SEARCH query is executed.

You can create an index over a specified list of fields. You can also specify ALL FIELDS, which will result in indexing all STRING or JSON data in the table.

Regardless of how your index was created, the SEARCH function provides a lot of flexibility that allows for selecting (or excluding) specific fields for participation in each specific query.

As with multi-statement transactions, there is not much to say about indexes. The moment you find out about their existence in BigQuery and understand their limitations, you will immediately know how to use them within your specific scenario.

Summary

Please, let me know about your favorite BigQuery features that you think have lacked attention so far. Twitter: @ArtemNikulchen1

 

 

 

By Artem Nikulchenko, Chief Software Architect and Champion Innovator
Source Google Cloud


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Features
  • Google Cloud
You May Also Like
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
DeepSeek R1 is now available on Azure AI Foundry and GitHub
View Post
  • Public Cloud
  • Technology

DeepSeek R1 is now available on Azure AI Foundry and GitHub

  • February 2, 2025
Cloud platforms among the clouds
View Post
  • Computing
  • Learning
  • Public Cloud

Best Cloud Platforms Offering Free Trials for Cloud Mastery

  • December 23, 2024
Vehicle Manufacturing
View Post
  • Hybrid Cloud
  • Public Cloud

Toyota shifts into overdrive: Developing an AI platform for enhanced manufacturing efficiency

  • December 10, 2024
IBM and AWS
View Post
  • Public Cloud

IBM and AWS Accelerate Partnership to Scale Responsible Generative AI

  • December 2, 2024
COP29 AI and Climate Change
View Post
  • Public Cloud
  • Technology

How Cloud And AI Are Bringing Scale To Corporate Climate Mitigation And Adaptation

  • November 18, 2024
Cloud Workstations
View Post
  • Public Cloud

FEDRAMP High Development in the Cloud: Code with Cloud Workstations

  • November 8, 2024
View Post
  • Public Cloud

PyTorch/XLA 2.5: vLLM support and an improved developer experience

  • October 31, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.