aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering
  • Tools

Unifying Data And AI To Bring Unstructured Data Analytics To BigQuery

  • aster.cloud
  • November 12, 2022
  • 4 minute read
Over one third of organizations believe that data analytics and machine learning have the most potential to significantly alter the way they run business over the next 3 to 5 years. However, only 26% of organizations are data driven. One of the biggest reasons for this gap is that a major portion of the data generated today is unstructured, which includes images, documents, and videos. It is estimated to cover roughly up to 80% of all data, which has so far remained untapped by organizations.One of the goals of Google’s data cloud is to help customers realize value from data of all types and formats. Earlier this year, we announced BigLake, which unifies data lakes and warehouses under a single management framework, enabling you to analyze, search, secure, govern and share unstructured data using BigQuery.At Next ‘22, we announced the preview of object tables, a new table type in BigQuery that provides a structured record interface for unstructured data stored in Google Cloud Storage. This enables you to directly run analytics and machine learning on images, audio, documents and other file types using existing frameworks like SQL and remote functions natively in BigQuery itself. Object tables also extend our best practices of securing, sharing and governing structured data to unstructured, without needing to learn or deploy new tools.

Directly process unstructured data using BigQuery ML

Object tables contain metadata such as URI (Uniform Resource Identifier), content type, and size that can be queried just like other BigQuery tables. You can then derive inferences using machine learning models on unstructured data with BigQuery ML. As part of preview, you can import open source TensorFlow Hub image models, or your own custom models to annotate the images. Very soon, we plan to enable this for audio, video, text and many other formats, and pre-trained models to enable out-of-the box analysis. Check out this video to learn more and watch a demo.

Read More  Convert Your Windows Install Into A VM On Linux
# Create an object table
CREATE EXTERNAL TABLE my_dataset.object_table
WITH CONNECTION us.my_connection 
OPTIONS(uris=["gs://mybucket/images/*.jpg"],
        object_metadata="SIMPLE", metadata_cache_mode="AUTOMATIC");

     # Generate inferences with BQML
SELECT * FROM ML.PREDICT(
    MODEL my_dataset.vision_model, 
    (SELECT ML.DECODE_IMAGE(data) AS img FROM my_dataset.object_table)
);

By analyzing unstructured data natively in BigQuery, businesses can


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

  • Eliminate manual effort as pre-processing steps such as tuning image sizes to model requirements are automated
  • Leverage the simple and familiar SQL interface to quickly gain insights
  • Save costs by utilizing existing BigQuery slots without needing to provision new forms of compute

Adswerve is a leading Google Marketing, Analytics and Cloud partner on a mission to humanize data. Twiddy & Co. is Adswerve’s client – a vacation rental company in North Carolina. By combining structured and unstructured data, Twiddy and Adswerve used BigQuery ML to analyze images of rental listings and predict the click-through rate, enabling data-driven photo editorial decisions.

“Twiddy now has the capability to use advanced image analysis to stay competitive in an ever changing landscape of vacation rental providers – and can do this using their in-house SQL skills.” said Pat Grady, Technology Evangelist, Adswerve

Process unstructured data using remote functions

Customers today use remote functions (UDFs) to process structured data for languages and libraries that are not supported in BigQuery. We are extending this capability to process unstructured data using object tables.

Object tables provide signed URLs to allow remote UDFs running on Cloud Functions or Cloud Run to process the object table content. This is particularly useful for running Google’s pre-trained AI models, including Vision AI, Speech-to-Text, Document AI, open source libraries such as Apache Tika, or deploying your own custom models where performance SLAs are important.

Read More  Bi-Directional Replication For Cloud SQL For PostgreSQL Using Logical Replication

Here’s an example of an object table being created over PDF files that are parsed using an open source library running as a remote UDF.

SELECT uri, extract_title(samples.parse_tika(signed_url)) AS title
FROM EXTERNAL_OBJECT_TRANSFORM(TABLE pdf_files_object_table, 
                               ["SIGNED_URL"]);

Extending more BigQuery capabilities to unstructured data

Business intelligence – The results of analyzing unstructured data either directly in BigQuery ML or via UDFs can be combined with your structured data to build unified reports using Looker Studio (at no charge), Looker or any of your preferred BI solutions. This allows you to gain more comprehensive business insights. For example, online retailers can analyze product return rates by correlating them with the images of defective products. Similarly, digital advertisers can correlate ad performance with various attributes of ad creatives to make more informed decisions.

BigQuery search index – Customers are increasingly using the search functionality of BigQuery to power search use cases. These capabilities now extend to unstructured data analytics as well. Whether you use BigQueryML to produce inference on images or use remote UDFs with Doc AI to produce document extraction, the results can now be search indexed and used to support search access patterns.

Here’s an example of search index on data that is parsed from PDF files:

CREATE SEARCH INDEX my_index ON pdf_text_extract(ALL COLUMNS);

SELECT * FROM pdf_text_extract WHERE SEARCH(pdf_text, "Google");

Security and governance – We are extending BigQuery’s row-level security capabilities to help you secure objects in Google Cloud Storage. By securing specific rows in an object table, you can restrict the ability of end users to retrieve the signed URLs of corresponding URIs present in the table. This is a shared responsibility security model, for which administrators need to ensure that end users don’t have direct access to Google Cloud Storage, and use signed URLs from object tables as the only access mechanism.Here’s an example of a policy for PII images that are secured to be first processed through a blur pipeline:
CREATE ROW ACCESS POLICY pii_data ON object_table_images
GRANT TO ("group:[email protected]") 
FILTER USING (ARRAY_LENGTH(metadata)=1 AND   
              metadata[OFFSET(0)].name="face_detected")

Read More  Performance Considerations For Loading Data Into BigQuery
Soon, Dataplex will support object tables, allowing you to automatically create object tables in BigQuery and manage and govern unstructured data at scale.Data sharing – You can now use Analytics Hub to share unstructured data with partners, customers and suppliers while not compromising on security and governance. Subscribers can consume the rows of object tables that are shared with them, and use signed URLs for unstructured data objects.

Getting Started

Submit this form to try these new capabilities that unlock the power of your unstructured data in BigQuery. Watch this demo to learn more about these new capabilities.


Special thanks to engineering leaders Amir Hormati, Justin Levandoski and Yuri Volobuev for contributing to this post.

 

 

By: Gaurav Saxena (Sr. Product Manager, Google Cloud) and Thibaud Hottelier (Software Engineer, Google Cloud)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Artificial Intelligence
  • BigQuery;
  • Data Analytics
  • Google Cloud
  • SQL
  • Tutorials
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.