aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Tools

Using BigQuery With Data Sources In Google Cloud VMware Engine

  • aster.cloud
  • December 8, 2021
  • 6 minute read

This blog is intended for customers who have migrated on-premises data sources to Google Cloud VMware Engine and want to utilize data and analytics services provided by Google Cloud. One of the objectives of customers who choose Google Cloud is to leverage Google Cloud analytics with their datasets. If you are an IT decision maker or a data architect who wants to quickly use the power of your data with Google analytics, this blog describes approaches to access your data within BigQuery, where advanced analytics and machine learning on your datasets is possible.

Why?

Data consumption and analytics is at the forefront of technology. Customers today consume and manage large amounts of data and resource pools. These challenges create an opportunity for Google Cloud to assist in managing and understanding your existing databases without having the need to undergo costly re-architecting of your source material or data location. This blog presents approaches to access Google Cloud data and analytics services with your existing data without having to re-architect your databases. Once your data sources are in Google Cloud VMware Engine, Google’s highly available and fault tolerant infrastructure can be leveraged to enhance the performance of data pipelines. These solutions aim to reduce time to value extraction from your datasets with cloud native analytics available via BigQuery.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

This solution of migrating via Google Cloud VMware Engine offers advantages to all parts of data operations. The database administrator (DBA) and virtual infrastructure/cloud admins can use familiar environments similar to on-premises on the cloud. The on-premises infrastructure team can enable the data scientist/AI/machine learning (ML) teams using familiar toolsets. These teams now have access to Google Cloud AI/ML/data analytics capabilities for their on-premises data.

 

 

For example, if you want to uncover cross-sell opportunities within your products, the first step is to ensure that product usage and billing datasets across your products are connected for analytics. The DBA team will identify these datasets and the infrastructure team will enable access to these sources. The application team will then replicate this data to BigQuery and use approaches such as BigQuery ML recommendations to uncover cross-sell opportunities. Another example of a use case is forecasting usage growth for operations and growth planning. Once your sales data is replicated within BigQuery, approaches for advanced time-series forecasting become available with your datasets.

Read More  Change Streams For Cloud Spanner: Now Generally Available

What does this cover?

We present approaches to replicate your relational datasets within BigQuery in a private and secure way utilizing either Google Cloud Data Fusion or Google Cloud Datastream. Datafusion is an ETL tool that supports various kinds of data pipelines. Datastream is a service for change-data-capture and replication. Using both these services, data is always within your projects in Google Cloud and internal IP is used to access data. We will focus on real-time replication, so that you can access your data continuously from operational data stores, such as SQL Server, MySQL, and Oracle within BigQuery.

Moving data from your data sources to the cloud and maintaining data pipelines to your data warehouses via Extract Transform Load (ETL) is a time consuming activity. An alternate approach is ELT (Extract Load Transform). The ELT approach loads data into the target system (e.g., BigQuery) before transforming the data. The ELT process is frequently preferred over the traditional ETL process because it’s simpler to realize and loads the data faster.

With your datasets now residing in the Google Cloud, data teams can utilize Cloud Data Fusion and Datastream over the high speed, low latency Google Cloud network to replicate or move data from your VMware infrastructure to various destinations in Google Cloud such as Google Cloud native storage buckets or BigQuery.

For simplicity, we will assume that all services are consumed within the same project. We will also discuss some pricing implications when moving data from Google Cloud VMware Engine from on-premises or another virtual private cloud (VPC).

 

 

Cloud Data Fusion:

Cloud Data Fusion provides a visual point-and-click interface that enables code-free deployment of ETL/ELT data pipeline. Cloud Data Fusion also provides a replication accelerator that allows you to replicate your tables into BigQuery.

Cloud Data Fusion internally sets up a tenant project with its own VPCs to manage Cloud Data Fusion resources. To access data sources within Google Cloud VMware Engine using Cloud Data Fusion, we use a reverse proxy on the main VPC. This is described in the image below.

Read More  Know More, Spend Less: How GKE Cost Optimization Insights Help You Optimize Kubernetes

 

 

In this scenario, we have our data workloads running on the Google Cloud VMware Engine instance within the project. The Google Cloud VMware Engine environment is accessed via a project level VPC peered with Google Cloud VMware Engine. A Google Compute Engine instance on the project level VPC exposes reverse proxy to the Google Cloud VMware Engine database to services that are unable to access the Google Cloud VMware Engine instance directly. A Cloud Data Fusion instance is enabled with private IP access and network peering to the main VPC and is able to access the data via the reverse proxy instance. This process to set up internal IP access and network peering on Cloud Data Fusion is described in this documentation.

Once this peering is complete, we use a Java Database Connectivity connector within Cloud Data Fusion to access our databases either for replication or for advanced ETL operations. To enable change data capture, we need to enable the database within Google Cloud VMware Engine to track and capture the changes to the databases. This entire process setup and replication are described in the documentation for MySQL and for SQL Server.

Google Cloud Datastream:

Datastream is a serverless change data capture and replication service. You can access streaming, low-latency data from Oracle, and MySQL databases on Google Cloud VMware Engine. This approach offers more flexibility in managing data flow pipelines. This solution is currently in pre-general availability and is only available in select regions.

This option also requires a reverse proxy configured within a Google Compute Engine instance. This reverse proxy is used to access data sources within Google Cloud VMware Engine. This option is described in this documentation.

 

 

 

 

The complete setup to use Datastream can be found in this how-to guide. To enable replication, we need a stream configured on Datastream, this stream accesses data from the database and pipes the data to the cloud storage sink. Datastream accesses data using  a reverse proxy which needs to be exposed on the customer’s VPC. To pipe the data to BigQuery, we use a pre-configured Datastream to BigQuery template within Dataflow.

Read More  How To Install Ansible In Ubuntu

How to get started?

First step is to migrate workloads to Google Cloud VMware Engine. Your cloud admin/architect will typically drive this. If not already identified during the migration phase, the next step is to identify databases residing on virtual machines hosted within Google Cloud VMware Engine, and recreate existing reports using BigQuery. In most organizations there will be multiple personas involved with this process. For example, a data architect might be the best source for info on data sources, a solutions architect will have insights on the cost/performance implications, and the infrastructure inputs will be needed for network interfaces. The steps below outline one possible approach to enable this motion.

  1. Identify datasets residing on virtual machines migrated to Google Cloud VMware Engine that are used for reports.
  2. Select the right pipeline (Datastream vs. Data Fusion) based on the database type and the pipeline requirements (price/performance trade offs and ease of use).
  3. Based on the data pipeline, select the appropriate region. There are no data egress charges within the same region.
  4. Setup the reverse proxy to the Google Cloud VMware Engine dataset.
  5. Setup the replication service with performance parameters based on the replication performance needed.
  6. Enable analytics and visualization based on the business requirements on the dataset.

Conclusion:

The Google Cloud VMware Engine service is a fast and easy way to enable data and analytics visualization using your existing data sets. You can now leverage your existing infrastructure operational posture on VMware to enable cloud analytics without having to undergo time consuming re-architecting of your databases. These approaches enable you to leverage the performance benefits of dedicated hardware on Google Cloud, connecting with the world’s most advanced data capabilities.


Acknowledgements:

The authors would like to thank Manoj Sharma and Sai Gopalan regarding their inputs on this blog.

 

 

By: Nitish Murthy (Product Manager, Google Cloud) and Wade Holmes (Solutions Manager – VMware-aaS Global Lead)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Cloud Data Fusion
  • Cloud Datastream
  • Google Cloud
  • VMware
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
dotlah-smartnation-singapore-lawrence-wong
View Post
  • Data
  • Enterprise
  • Technology

Growth, community and trust the ‘building blocks’ as Singapore refreshes Smart Nation strategies: PM Wong

  • October 8, 2024
nobel-prize-popular-physics-prize-2024-figure1
View Post
  • Data
  • Featured
  • Technology

They Used Physics To Find Patterns In Information

  • October 8, 2024
zedreviews-Apple-iPhone-16-Pro-finish-lineup-240909
View Post
  • Featured
  • Gears
  • Tech
  • Technology
  • Tools

Apple debuts iPhone 16 Pro and iPhone 16 Pro Max

  • September 10, 2024
zedreviews-Apple-AirPods-Active-Noise-Cancellation-240909
View Post
  • Featured
  • Gears
  • Tech
  • Technology
  • Tools

Apple introduces AirPods 4 and the world’s first all-in-one hearing health experience with AirPods Pro 2

  • September 10, 2024
goswifties_number-crunching_202405_wm
View Post
  • Data
  • Featured

Of Nuggets And Tenders. To Know Or Not To Know, Is Not The Question. How To Become, Is.

  • May 25, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.