Analytics data is growing exponentially and so is the dependence on the data in making critical business and product decisions. In fact, the best decisions are said to be the ones which are backed by data. In data, we trust!

But do we trust the data ?

Partner with aster.cloud
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

As the data volumes have grown – one of the key challenges organizations are facing is how to maintain the data quality in a scalable and consistent way across the organization. While data quality is not a newly found need, the needs used to be contained when the data footprint was small and data consumers were few. In such a world, data consumers knew who the producers were and producers knew what the consumers needed. But today, data ownership is getting distributed and data consumption is finding new users and use cases. So the existing data quality approaches find themselves limited and are isolated to certain pockets of the organization. This often exposes data consumers to inconsistent and inaccurate data which ultimately impacts the decisions made from that data. As a result, organizations today are losing 10s of millions of dollars due to the low quality of data.

These organizations are looking for solutions that empower their data producers to consistently create high quality data cloud scale.

Building Trust with Dataplex data quality

Earlier this year, at Google Cloud, we launched Dataplex, an intelligent data fabric that enables governance and data management across distributed data at scale. One of the key things Dataplex enables out-of-box is for data producers to build trust in the data with a built-in data quality.

Dataplex data quality task delivers a declarative, data-ops centric experience for validating data across BigQuery and Google Cloud Storage. Producers can now easily build and publish quality reports or can easily include data validations as part of their data production pipeline. Reports can be aggregated across various data quality dimensions and the execution is entirely serverless.

Dataplex data quality task provides –

A declarative approach for defining “what good looks like” that can be managed as part of a CI/CD workflow.
A serverless and managed execution with no infrastructure to provision.
Ability to validate across data quality dimensions like freshness, completeness, accuracy and validity.
Flexibility in execution – either by using Dataplex serverless scheduler (at no extra cost) or executing the data validations as part of a pipeline (e.g. Apache Airflow).
Incremental execution – so you save time and money by validating new data only.
Secure and performant execution with zero data-copy from BigQuery environments and projects.
Programmatic consumption of quality metrics for Dataops workflows.

Users can also execute these checks on data that is stored in BigQuery and Google Cloud Storage but is not yet organized with Dataplex. For Google Cloud Storage data that is managed by Dataplex, Dataplex auto-detects and auto-creates tables for structured and semi-structured data. These tables can be referenced with the Dataplex data quality task as well.

Behind the scenes – Dataplex makes use of an open source data quality engine – Cloud Data Quality Engine – to run these checks. Providing an open platform is one of our key goals and we have made contributions to this engine to integrate seamlessly with Dataplex’s metadata and serverless environment.

You can learn more about this in our product documentation.

Building enterprise trust at American Eagle Outfitters

One of our enterprise customers – American Eagle Outfitters (AEO) – is continuing to build trust in their critical data using Dataplex Data Quality Task. Kanhu Badtia, lead data engineer from AEO, shares their rationale and experience with Dataplex data quality task:

“AEO is a leading global specialty retailer offering high-quality & on-trend clothing under its American Eagle® and Aerie® brands. Our company operates stores in the United States, Canada, Mexico, and Hong Kong, and ships to 81 countries worldwide through its websites.

We are a data-driven organization that utilizes data from physical and digital store fronts, from social media channels, from logistics/delivery partners and many other sources through established compliant processes. We have a team of data scientists and analysts who create models, reports and dashboards that inform responsible business decision-making on such matters as inventory, promotions, new product launches and other internal business reviews. As the data engineering team at AEO, our goal is to provide highly trusted data for our internal data consumers.

Before Dataplex – AEO had methods for maintaining data quality that were effective for their purpose. However, those methods were not scalable with the continual expansion of data volume and demand for quality results from our data consumers. Internal data consumers identified and reported quality issues where ‘bad data’ was impacting business critical dashboards/reports . As a result, our teams were often in “fire-fighting” mode – finding & fixing bad data. We were looking for a solution that would standardize and scale data quality across the production data pipelines.

The majority of AEO’s business data is in Google’s BigQuery or in Google Cloud Storage (GCS). When Dataplex launched the data quality capabilities, we immediately started a proof-of-concept. After a careful evaluation, we decided to use it as the central data quality framework for production pipelines. We liked that –

It provides an easy declarative (YAML) & flexible way of defining data quality. We were able to parameterize it to use across multiple tables.
It allows validating data in any BigQuery table with a completely serverless and native execution using existing slot reservations.
It allows executing these checks as part of the ETL pipelines using DataPlex Airflow Operators. This is a huge win as pipelines can now pause further processing if critical rules do not pass.
Data quality checks are executed in parallel which gives us the required execution efficiency in pipelines.
Data quality results are stored centrally in BigQuery & can be queried to identify which rules failed/succeeded and how many rows failed. This enables defining custom thresholds for success.
Organizing data in Dataplex Lakes is optional when using Dataplex data quality.

Our team truly believes that data quality is an integral part of any data-driven organization and Dataplex DQ capabilities align perfectly with that fundamental principle.

For example, here is a sample Google Cloud Composer / Airflow DAG that loads & validates the “item_master” table and stops downstream processing if the validation fails.

It includes simple rules for uniqueness, completeness and more complex rules for referential integrity or business rules such as checking daily price variance. We publish all data quality results centrally to a BigQuery table, such as this:

We query this output table for data quality issues & fail the pipeline in case of critical rule failure. This stops low quality data from flowing downstream.

We now have a repeatable process for data validation that can be used across the key data production pipelines. It standardizes the data production process and effectively ensures that bad data doesn’t break downstream reports and analytics.”

Learn more

Here at Google – we are excited to enable our customer’s journey to high quality, trusted data. To learn more about our current data quality capabilities please refer to –

By: Sandeep Karmarkar (Product Manager, Google Cloud) and Kanhu Badtia (Sr. Data Engineer, American Eagle Outfitters)
Source: Google Cloud Blog

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Building Trust In The Data With Dataplex

From our partners:

Building Trust with Dataplex data quality

Building enterprise trust at American Eagle Outfitters

Learn more

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

aster.cloud

Related Topics

IBM Study: One in Four Malicious Breaches are AI-Enabled, Costing Companies $6 Million on Average

Accelerating the frontiers of scientific discovery: Google’s $40M commitment to the Genesis Mission

3 Questions: Neural transparency and the future of AI design

Intel Invests €5 Billion to Expand Manufacturing in Europe

IBM and Red Hat Expand Lightwell with New Offerings to Build the Trust Infrastructure for AI-Era Open Source

When I Was Young

The Fastest AI Fried Chicken In The World

Zed Approves | How to Stay Cool in Extreme Heat

The AI investment surge hasn’t produced the expected results yet. That could change in 2026

Zed Approves | It’s Prime Day 2026! Time to Upgrade Your World Cup Viewing Setup and Beat the Heat

Most Popular

Zed Approves | The Best Prime Day PC Deals: Top Gaming Rigs, Workstations, and Everyday Laptops

Zed Approves: How to Gear Up for GTA 6 This Amazon Prime Day (2026 Quick Guide)

Father’s Day Outdoors – Build Dad the Ultimate Backyard Watch Party

Father’s Day Outdoors, Round Two – Gear for the Action, the Tailgate, and Beating the Heat

The Ultimate Father’s Day Gift Guide – Home Entertainment Upgrades Dad Actually Wants

Building Trust In The Data With Dataplex

From our partners:

Building Trust with Dataplex data quality

Building enterprise trust at American Eagle Outfitters

Learn more

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Related Topics

You May Also Like