aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Big Data
  • Data
  • Design
  • Engineering

Intro To Data Science On Google Cloud

  • aster.cloud
  • February 8, 2022
  • 7 minute read

While you likely know that data science is the practice of making data useful, you may not have a clear landscape around the tools that can aid each stage of the data science workflow as you use machine learning to tackle your challenges.

Click to enlarge

 


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Read on to discover the six broad areas that are critical to the process of making data useful, and some corresponding Google Cloud products and services for those areas.

 

Data engineering

Perhaps the greatest missed opportunities in data science stem from  data that exists somewhere, but hasn’t been made accessible for use in further analysis. Laying the critical foundation for downstream systems, data engineering involves the transporting, shaping, and enriching of data for the purposes of making it available and accessible.

Data ingestion and data preprocessing on Google Cloud

Here we consider data ingestion as moving data from one place to another, and data preparation the process of transformation, augmentation, or enrichment prior to consumption. Global scalability, high throughput, real-time access, and robustness are common challenges in this stage. For scalable, real-time, and batch data processing, look into building data ingestion and preprocessing pipelines with Dataflow, a managed Apache Beam service. There’s a reason why Dataflow is called the backbone of analytics on Google Cloud.
If you’re looking for a scalable messaging system to help you ingest data, consider Cloud Pub/Sub, a global, horizontally scalable messaging infrastructure. Cloud Pub/Sub was built using the same infrastructure component that enabled Google products, including Ads, Search, and Gmail, to handle hundreds of millions of events per second.
If you want an easy way to automate data movement to BigQuery, a serverless data warehouse on Google Cloud, look into the BigQuery Data Transfer Service. For transferring data to Cloud Storage, take a look at the Storage Transfer Service. Or, for a no-code data ingestion and transformation tool, check out Data Fusion, which has over 150 preconfigured connectors and transformations. In addition to Dataflow and Data Fusion for data preparation, Spark users may want to look at related products and features for Spark on Google Cloud.

Data storage and data cataloging on Google Cloud

For structured data, consider a data warehouse like BigQuery, or any of the Cloud Databases (relational ones like Cloud SQL and NoSQL ones like Cloud BigTable and Cloud Firestore). For unstructured data, you can always use Cloud Storage. You may also want to consider a data lake. For data discovery, cataloging, and metadata management, consider Data Catalog. For a unified solution, take a look at Dataplex, which integrates a unified data management solution with an integrated analytics experience.

Learn more about data engineering on Google Cloud

  • Explore the data engineering learning path
  • Discover reference patterns
  • Get certified by Google Cloud as a Professional Data Engineer
Click to enlarge

 

Data Analysis

From descriptive statistics to visualizations, data analysis is where the value of data starts to appear.

Read More  Why Managed Container Services Help Startups And Tech Companies Build Smarter

Data exploration, data preprocessing, and data insights

Data exploration, a highly iterative process, involves slicing and dicing data via data preprocessing before data insights can start to manifest through visualizations or simply via simple group-by, order-by operations. One hallmark of this phase is that the data scientist may not yet know which questions to ask about the data. In this somewhat ephemeral phase, a data analyst or scientist has likely uncovered some aha-moments, but hasn’t shared them yet. Once insights are shared, the flow enters the Insights Activation stage, where those insights become used to guide business decisions, influence consumer choices, or become embedded in other applications or services.

On Google Cloud, there are many ways to explore, preprocess, and uncover insights in your data. If you are looking for a notebook-based end-to-end data science environment, check out Vertex AI Workbench, which enables you to access, analyze, and visualize your entire data estate: from structured data at the petabyte-scale in SQL with BigQuery, to processing data with Spark on Google Cloud and its serverless, auto-scaling, and GPU acceleration capabilities. As a unified data science environment, Vertex AI Workbench also makes it easy to do machine learning with TensorFlow, PyTorch, and Spark, with built-in MLOps capabilities.

Finally, if your focus is on analyzing structured data from data warehouses and insight activation for business intelligence, you may want to also consider using Looker, with its rich interactive analytics, visualizations, dashboarding tools, and Looker Blocks to help you accelerate your time-to-insight.

Learn more about data analysis on Google Cloud

  • Learn about Vertex AI Workbench for a Jupyter-based fully managed notebook environment
  • Learn about how you can use BigQuery for petabyte-scale data analysis
  • Learn about Spark on Google Cloud
  • Discover the data analyst learning path
  • Explore reference patterns for common analytics use cases

Model development

From linear regression to XGBoost, from TensorFlow to PyTorch, the model development stage is where machine learning starts to provide new ways of unlocking value from your data. Experimentation is a strong theme here, with data scientists looking to accelerate iteration speed between models without worrying about infrastructure overhead or context-switching between tools for data analysis and tools for productionizing models with MLOps.

To solve these challenges, once again, as a Jupyter-based fully managed, scalable, and enterprise-ready environment, Vertex AI Workbench makes it easy as the one-stop-shop for data science, combining analytics and machine learning, including Vertex AI services. Apache Spark, XGBoost, TensorFlow, and PyTorch are just some of the frameworks supported on Vertex AI Workbench. Vertex AI Workbench makes managing the underlying compute infrastructure needed for model training easy with the ability to scale vertically and horizontally, and with idle timeouts and auto shutdown capabilities to reduce unnecessary costs. Notebooks themselves can be used for distributed training and hyperparameter optimization, and they include Git integration for version control. Due to the significant reduction in context switching required, data scientists can build and train models 5x faster using Vertex AI Workbench than when using traditional notebooks.

Read More  Getting Started With Google Cloud Logging Python V3.0.0

With Vertex AI, custom models can be trained and deployed using containers. You can take advantage of pre-built containers or custom containers to train and deploy your models.

For low-code model development, data analysts and data scientists can use SQL with BigQuery ML to train and deploy models (including XGBoost, deep neural networks, and PCA),  directly using BigQuery’s built-in serverless, autoscaling capabilities. Behind-the-scenes, BigQuery ML leverages Vertex AI to enable automated hyperparameter tuning, and explainable AI. For no-code model development, Vertex AI Training provides a point-and-click interface to train powerful models using AutoML, which comes in multiple flavors: AutoML Tables, AutoML Image, AutoML Text, AutoML Video, and AutoML Translation.

Learn more about model development on Google Cloud

  • Learn about Vertex AI Workbench for a Jupyter-based fully managed notebook environment
  • Learn more about Vertex AI

ML engineering

Once a satisfactory model is developed, the next step is to incorporate all the activities of a well-engineered application lifecycle, including testing, deployment, and monitoring. And all of those activities should be as automated and robust as possible.

Managed datasets and Feature Store on Vertex AI provide shared repositories for datasets and engineered features, respectively, which provide a single source of truth for data and promote reuse and collaboration within and across teams. Vertex AI’s model serving capability enables deployment of models with multiple versions, automatic capacity scaling, and user-specified load balancing. Finally, Vertex AI Model Monitoring provides the ability to monitor prediction requests flowing into a deployed model and automatically alert model owners whenever the production traffic deviates beyond user-defined thresholds and previous historical prediction requests.

MLOps is the industry term for modern, well engineered ML services, with scalability, monitoring, reliability, automated CI/CD, and many other characteristics and functions that are now taken for granted in the application domain. The ML engineering features provided by Vertex AI are informed by Google’s extensive experience deploying and operating internal ML services. Our goal with Vertex AI is to provide everyone with easy access to essential MLOps services and best practices.

Learn more about ML engineering and MLOps on Google Cloud

  • Follow the guides, tutorials and documentation for Vertex AI
  • Watch this video to learn more about Vertex AI
  • Discover the data scientist/machine learning engineer learning path
  • Get certified as a Professional ML Engineer

Insights activation

The insights activation stage is where your data has now become useful to other teams and processes. You can use Looker and Data Studio to enable use cases in which data is used to influence business decisions with charts, reports, and alerts.

Read More  Understanding Transactional Locking In Cloud Spanner

Data can also influence customer decisions and as a result increase usage or decrease churn, for example. Finally, the data can also be used by other services to drive insights; these services can run outside Google Cloud, inside Google Cloud on Cloud Run or Cloud Functions, and/or using Apigee API Management as an interface.

Learn more about insights activation on Google Cloud

  • Watch this video to learn about building interactive ML apps using Looker and Vertex AI
  • Learn about Looker, and Looker solutions for eCommerce, Digital Media and more
  • Discover a gallery of interactive dashboards created with Data Studio
  • Watch this video to understand the difference between Cloud Run and Cloud Functions

Orchestration

All of the capabilities discussed above provide the key building blocks to a modern data science solution, but a practical application of those capabilities requires orchestration to automatically manage the flow of data from one service to another. This is where a combination of data pipelines, ML pipelines, and MLOps comes into play.  Effective orchestration reduces the amount of time that it takes to reliably go from data ingestion to deploying your model in production, in a way that lets you monitor and understand your ML system.

For data pipeline orchestration, Cloud Composer and Cloud Scheduler are both used to kick off and maintain the pipeline.

For ML pipeline orchestration, Vertex AI Pipelines is a managed machine learning service that enables you to increase the pace at which you experiment with and develop machine learning models and the pace at which you transition those models to production. Vertex Pipelines is serverless, which means that you don’t need to deal with managing an underlying GKE cluster or infrastructure. It scales up when you need it to, and you pay only for what you use. In short, it lets you just focus on building your data science pipelines.

Learn more about orchestration on Google Cloud

  • Read more about Cloud Composer for Airflow-based pipelines
  • Try some example notebooks on Github with Vertex AI Pipelines
  • Learn different ways to trigger Vertex AI Pipeline runs
  • Read the whitepaper on Practitioners Guide to MLOps: A framework for continuous delivery and automation of machine learning

Summary

Google Cloud offers a complete suite of data management, analytics, and machine learning tools to generate insights from data. Want to learn more? Check out the following resources:

  • Building the data science driven organization from the first principles

Special thanks to the following contributors to this blogpost: Alok Pattani, Brad Miro, Saeed Aghabozorgi, Diptiman Raichaudhuri, Reza Rokni.

 

 

By: Priyanka Vergadia (Developer Advocate, Google), Polong Lin (Developer Advocate) and Marc Cohen (Developer Advocate)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Data Analysis
  • Data Engineering
  • Data Science
  • Design
  • Google Cloud
  • Machine Learning
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • oracle-ibm 3
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • 4
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 5
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 6
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 7
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 8
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 9
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 10
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
  • 2
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 3
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 4
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 5
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.