Thirty five years ago, SQL-86, the first SQL standard, came into our world, published as an ANSI standard in 1986 and adopted by the International Standards Organization (ISO) in 1987. On this Valentine’s Day, we, in BigQuery, reaffirm our love and commitment to user-friendly SQL through a whole slew of new SQL features that we’re pleased to share with you, our beloved BigQuery users.
They say time and tide wait for no man. Now, thanks to the
<a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#interval_type" target="_blank" rel="noopener">INTERVAL</a> data type, you can measure the duration of time within BigQuery. This datatype allows you to save the difference between a start and an end timestamp in a native datatype in units ranging from years to fractions of a second with sign.
#This example creates and queries a table with a column of INTERVAL type.
CREATE TABLE dataset.table(i INTERVAL) AS (
SELECT * FROM UNNEST([
INTERVAL 3 DAY,
INTERVAL 2 MONTH,
INTERVAL -2 MONTH,
INTERVAL 5 MINUTE,
INTERVAL 3 DAY + INTERVAL 1.234 SECOND
SELECT * FROM dataset.table;
#you can now add or subtract INTERVAL data from a DATE or DATETIME object to perform calendar arithmetic.
SELECT DATETIME ‘2021-06-01 04:00:00’ + i
Change column datatype
In a prior BigQuery user-friendly SQL update, we announced support for parameterized datatypes in BigQuery. Building on this, BigQuery now support the ability to change the datatype of an existing column to make it less restrictive. Using the
SET DATA TYPE clause, a
NUMERIC data type can be changed to a
BIGNUMERIC type or the length or precision & scale of a parameterized datatype column can be increased. For a table of valid data type coercions, compare the “From Type” column to the “Coercion To” column in the Conversion rules in Standard SQL page.
# The following example changes the data type of column c1 from an INT64 to NUMERIC:
CREATE TABLE dataset.table(c1 INT64);
ALTER TABLE dataset.table ALTER COLUMN c1 SET DATA TYPE NUMERIC;
# The following example changes the data type of one of the fields in the s1 column:
CREATE TABLE dataset.table(s1 STRUCT<a INT64, b STRING>);
ALTER TABLE dataset.table ALTER COLUMN s1
SET DATA TYPE STRUCT<a NUMERIC, b STRING>;
# The following example changes the precision of a parameterized data type column:
CREATE TABLE dataset.table (pt NUMERIC(7,2));
ALTER TABLE dataset.table
ALTER COLUMN pt
SET DATA TYPE NUMERIC(8,2);
Expanded SQL Expressions and Scripting Control Statements
WITH RECURSIVE common table expression
A common table expression (CTE) referenced using a WITH clause in a query allow the user to break up a complex query by allowing a temporary table containing the results of the CTE subquery which can then be referenced in other parts of the same query as a table. A recursive CTE referenced using a WITH RECURSIVE clause containting a UNION ALL operation has the following parts:
- base_term: Runs the initial iteration of the recursive operation.
- recursive_term: Runs the remaining iterations until the recursion terminates.
- union_operator: The UNION operator returns the rows that are from the union of the base term and recursive term.
Recursive CTEs can be very useful in querying hierarchical data in tables, such as an employee and their supervisor of a large multi-level organization or the bill-of-materials of a complex product defined by its subcomponents and their associated parts.
# The most common use case for WITH RECURSIVE is querying hierarchy data,
# where there are some relations among the rows of the table.
# Below is a regular CTE which contains two columns:
# employee_name and manager_name,
# one employee can only have one manager.
EmployeeInfo AS (
SELECT 'Thomas' AS employee_name, 'Alex' AS manager_name UNION ALL
SELECT 'Jim', 'Alex' UNION ALL
SELECT 'Nikola', 'Thomas' UNION ALL
SELECT 'John', 'Thomas' UNION ALL
SELECT 'Isaac', 'Jim' UNION ALL
SELECT 'Carl', 'Nikola' UNION ALL
SELECT 'Will', 'Nikola' UNION ALL
SELECT 'Lucy', 'John' UNION ALL
SELECT 'Charles', 'Carl' UNION ALL
SELECT 'James', 'Will' UNION ALL
SELECT 'Amanda', 'Lucy'
# Below is a recursive CTE which contains all the people
# that directly or indirectly report to Thomas.
# Below is the base term, which contains all the people
# that directly report to Thomas.
SELECT employee_name FROM EmployeeInfo WHERE manager_name = 'Thomas'
# Below is the recursive term, which recursively includes
# those people that directly report to Thomas's known reports.
SELECT e.employee_name FROM EmployeeInfo AS e JOIN ThomasReports AS t on e.manager_name = t.employee_name
# output the total number of Thomas's reports
select COUNT(*) AS total from ThomasReports
Control statements in Scripting
As business logic to analyze data becomes more complex, control statements in scripting allow data analysts to apply conditional logic to execute different workflows based on specific conditions encountered during script execution. BigQuery is pleased to support the following additional control statements in scripting:
<a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#for-in" target="_blank" rel="noopener">FOR…IN</a>: loops over every row in a table expression. This offers a succinct way to iterate through query results that other loops do not.
<a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#repeat" target="_blank" rel="noopener">REPEAT</a>: repeatedly executes a list of SQL statements until the boolean condition at the end of the list is
<a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#case" target="_blank" rel="noopener">CASE</a>: Provides a more efficient SQL expression to execute conditional logic that previously supported
IF…ELSE IFstatements. It executes the first list of SQL statements where a boolean expression is
<a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#case_search_expression" target="_blank" rel="noopener">CASE <i><search expression></i></a>: The
CASEstatement with the search expression executes the first list of SQL statements where the search expression matches a
<a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#labels" target="_blank" rel="noopener">Labels</a>: provides an unconditional jump to the end of the block or loop associated with a label. With labeled
CONTINUE, users now have more control over nested loops or statement bodies by skipping to specific (named) locations in the script instead of continuing with sequential execution.
# Example using FOR…IN
FOR record IN
(SELECT word, word_count
SELECT record.word, record.word_count;
Table copy DDL
CREATE TABLE LIKE and COPY
Analysts and data engineers often need to make a copy of a table schema (without data) or a full table copy (with data) from a production into a test or development environment. The
CREATE TABLE LIKE statement copies only the metadata of the source table while the
CREATE TABLE COPY statement copies both the metadata and data from the source table into the new table. The new table for both
CREATE TABLE operations has no relationship to the source table after creation; thus modifications to the source table will not propagate to the new table.
# The following example creates a new empty table named newtable
# in mydataset with the same metadata as sourcetable
#and the data from the SELECT statement:
CREATE TABLE mydataset.newtable
AS SELECT * FROM mydataset.myothertable
# The following example creates a copy of the mydataset.sourcetable table
# named newtable in mydataset:
CREATE TABLE mydataset.newtable
Expanded INFORMATION_SCHEMA views
INFORMATION SCHEMA for streaming data
If you stream data into BigQuery, you can now monitor your data streams using
INFORMATION_SCHEMA streaming views to retrieve historical and real-time information about data streaming into BigQuery. These views contain per minute aggregated statistics for each table that have data streamed into them.
# The following example calculates the per minute breakdown of total failed requests for all tables in the project in the last 30 minutes, split by error code.
SUM(total_requests) AS num_failed_requests
error_code IS NOT NULL
AND start_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 30 MINUTE)
Expanded DDL column support in INFORMATION_SCHEMA views
Last year, we announced DDL column support in
INFORMATION SCHEMA views – an innovative approach which allows data administrators to generate object creation DDL for one, multiple or all tables and views directly from the
TABLES INFORMATION_SCHEMA view. BigQuery now supports the ability to generate object creation DDL for other object types such as
<a href="https://cloud.google.com/bigquery/docs/information-schema-datasets#schemata_view" target="_blank" rel="noopener">schemata</a> (datasets) and
routines (functions, table functions and procedures).
By: Jagan R. Athreya (Product Manager, Google Cloud)
Source: Google Cloud Blog