Access to Google Cloud Console
Installed Python on your machine. Refer to the following for Ubuntu and Windows installation.
Installed Google Cloud SDK. Refer to the following on how to install for Ubuntu.

Reviewing the Data Structure

For this guide we’ll be using the “Predicted hourly heights of tides in 2019 – Quarry bay” (in Hongkong). You can get the data here at this link. Download the CSV file and save it to your local storage with the name, predicted_hourly_tide_2019.csv.

The CSV has 26 columns, where the first 2 are the month and day, the next 24 are the hours of the day. It has 365 records, each prediction for every single day of the year.

BigQuery is a service that is designed for data warehouse and analytic applications. While our sample data set is less than 500, BigQuery can work with larger numbers. It is even designed to store data at the petabyte scale. Note also that BigQuery is billed on amount of scanned data.

Setting up BigQuery

This example uses the GCP console to load the CSV data directly into BigQuery. There are other ways of importing data, see here.

01. Login to the GCP console at https://console.cloud.google.com.

02. On the sidebar menu, under “Big Data” select “BigQuery”.

03. Select the Project from which to create the BigQuery dataset and table. On the lower-right part of the screen, you will see the “CREATE DATASET” button. This is like the database when comparing it an RDBMS database like MySQL and PostgreSQL. If the button does not appear, select the project from the “Resources’ section on the left side of the page.

04. It will show the following form on the right-side of the page.

Dataset ID

Name of the dataset. Only numbers, letters and underscores are allowed. If comparing with SQL, this is like the Database Name.

Example: analysis_dataset

Data location

Determines in what location the data is stored.

Example: Singapore (asia-southeast1)

Default table expiration

— Never

— Number of days after table of creation

If you plan to use the dataset for only a short amount of time it is recommended to set an expiration days. BigQuery is billed for its storage and queries.

Example: 1 (days)

05. Then select the “Create dataset” button once ready.

06. On the “Resources” section on the left. You can select the project then the dataset. The details of the dataset will now be visible, along with the Create Table button. Select the “Create Table” button.

07. A new form will appear from the right-side of the page.

08. At the “Create table from” select “Upload”. You can use other ways to import data to BigQuery. For this example we are using a CSV. Then select the file from local. Complete the following details.

Create table from	Options: — Empty table — Google Cloud storage — Upload — Drive — Google Cloud Bigtable It is possible to load data from other sources, but for this example, we will be selecting the “Upload” option which allows us to select a CSV file from our local machine. Example: Upload
Select file	Select the CSV file to import. This is the data that will be created as a table in BigQuery. We will be using the data we reviewed in the previous section. Example: predicted_hourly_tide_2019.csv
File format	The structure of the file loaded. It will autodetect the selection based on the selected file. BigQuery supports different file formats, including Avro, which is useful for data serialization. Example: CSV
Project name	The name of the GCP Project to load the data Example: zo-geek-project
Dataset name	Name of the dataset to place the new table to. Example: analysis_dataset
Table type	This will be automatically set to Native table.
Table name	Name of the table. Hierarchically, this is the same as the table name in SQL. Example: predicted_hourly_tide_2019
Schema > Auto detect	If your CSV file has a fixed structure, you can let BigQuery derive the structure from t. Or you can manually add the columns.
Partitioning	“No Partitioning” Can be used to divide the table into smaller segments. Partitioned tables can then be queried using filters based on the partitioning column.
Clustering order
ADVANCED OPTIONS
Header rows to skip	If your CSV data has a header, specify the row number when the data start. Example: 1
Encryption	“Google-managed key” You can select your own key if there are needed compliance or policies set by the company or organization. For this example, we will let Google create and manage our key.

09. After data import, you can query the content using the Query Editor input field.

# FORMAT
$ SELECT * FROM `{{project-name}}.{{dataset-name}}.{{table-name}}` LIMIT 1000

# SAMPLE
SELECT * FROM `zo-geek-project.analysis_dataset.predicted_hourly_tide_2019` LIMIT 1000

Execute the query and it will show the result.

Querying Data

01. You will need a service account in order to access BigQuery using the Python SDK. Use or create a service account first before proceeding. Navigate to the menu and select “IAM & admin”.

02. Select Service Accounts from the menu. If you already have a Service Account that has access to BigQuery you can use that. Else, create a new one by selecting the “Create Service Account” button.

03. A new form will be shown. Enter the Service Account Name and an optional description. The Service Account ID is automatically generated based on the name, but you can still override it if you want. Then select the “Create” button.

04. Next is to specify the permissions. It is recommended to give only the necessary amount of permission to a Service Account. In this case we will be selecting a Predefined role of “BigQuery Data Viewer”, “BigQuery User” and BigQuery Data Viewer, which can read the content of a BigQuery table and create query jobs Then select the “Continue” button.

05. On the next section, we will create the Key. Do not lose this key as it cannot be recovered. Select the “Create key” button and a new form will appear.

Select “JSON”.

06. A JSON file will be downloaded to your local machine. This will be used in the Python script. Close the popup/modal, then select “Done” button.

07. You must have installed and configured Python, if not you can follow the guides for Windows and Ubuntu and have created a Virtual Environment to separate your configurations. Create the following Python script.

Change the values of the project name, data set, table and the location on where the Service Account file is located.

def perform_query():
from google.cloud import bigquery

client = bigquery.Client()
query_job = client.query("SELECT * FROM `{{gcp-project-name}}.{{bigquery-dataset-name}}.{{bigquery-table-name}}` LIMIT 1000")

for row in results:
# Prints out the predicted tide everyday on the 1st Hour
print("Month: {0}, Day: {1}, 1 AM: {2} ".format(row.MM, row.DD, row._01))

for row in results:
print("Cert No. {0}, Scope {1} ".format(row.CertNo, row.Scope))

if __name__ == '__main__':
import os
from google.cloud.bigquery.client import Client
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '{{sa-account-path}}/{{service-account}}.json'

perform_query()

08. Before running the Python script, make sure you have installed the necessary library in your global environment of Virtual Environment.


$ pip install --upgrade google-cloud-bigquery

09. When the Python script is executed. It will output something like.

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Introduction To Data Analysis Using GCP BigQuery And Python

Overview

From our partners:

Prerequisites

Reviewing the Data Structure

Setting up BigQuery

Querying Data

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

root

Related Topics

What is an SBOM (software bill of materials)?

Send SMS texts with Amazon’s SNS simple notification service

The Summer Adventures : Camping Essentials

Host a static website on AWS with Amazon S3 and Route 53

Prioritize security from the edge to the cloud

6 edge monitoring best practices in the cloud

AlphaGenome: AI for better understanding the genome

Pure Accelerate 2025: All the news and updates live from Las Vegas

‘This was a very purposeful strategy’: Pure Storage unveils Enterprise Data Cloud in bid to unify data storage, management

What is cloud bursting?

Most Popular

There’s a ‘cloud reset’ underway, and VMware Cloud Foundation 9.0 is a chance for Broadcom to pounce on it

Oracle adds xAI Grok models to OCI

What is confidential computing?

Fine-tune your storage-as-a-service approach

Advanced audio dialog and generation with Gemini 2.5

Introduction To Data Analysis Using GCP BigQuery And Python

Overview

From our partners:

Prerequisites

Reviewing the Data Structure

Setting up BigQuery

Querying Data

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Related Topics

You May Also Like