aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • People

How To Use Google Cloud To Find And Protect PII

  • aster_cloud
  • September 27, 2022
  • 4 minute read

BigQuery is a leading data warehouse solution in the market today, and is valued by customers who need to gather insights and advanced analytics on their data. Many common BigQuery use cases involve the storage and processing of Personal Identifiable Information (PII)—data that needs to be protected within Google Cloud from unauthorized and malicious access.

Too often, the process of finding and identifying PII in BigQuery data relies on manual PII discovery and duplication of that data. One common way to do this is by taking an extract of columns used for PII and copying them into a separate table with restricted access. However, creating unnecessary copies of this data and processing it manually to identify PII increases the risks of failure and subsequent security events.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

In addition, the security of PII data is often mandated by multiple regulations and failure to apply appropriate safeguards may result in heavy penalties. To address this issue, customers need solutions that 1) identify PII in BigQuery and 2) automatically implement access control on that data to prevent unauthorized access and misuse, all without having to duplicate it.

This blog will discuss a solution developed by Google Professional Services for leveraging Google Cloud DLP to inspect and classify sensitive data and suggest a solution for using these insights to automatically tag and protect data in BigQuery tables.

BigQuery Auto Tagging solution overview

Automatic DLP can help to identify sensitive data, such as PII, in BigQuery. Organizations can leverage Automatic DLP to automatically search across their entire BigQuery data warehouse for tables that contain sensitive data fields and report detailed findings in the console (see Figure 1 below,) in Data Studio, and in a structured format (such as a BigQuery results table.) Newly created and updated tables can be discovered, scanned, and classified automatically in the background without a user needing to invoke or schedule it. This way you have an ongoing view into your sensitive data.

Read More  Small Footprint, Big Impact: Running Cloud-Connected Kubernetes At The Edge
Figure 1: Visibility of sensitive data fields in BigQuery

 

In this blog, we show how a new open source solution called BigQuery Auto Tagging Solution solves our second goal—automating access control on data. This solution sits as a layer on top of Automatic DLP and automatically enforces column-level access controls to restrict access to specific sensitive data types based on user-defined data classification taxonomies (such as high confidentiality or low confidentiality) and domains (such as Marketing, Finance, or ERP System.) This solution minimizes the risk of unrestricted access to PII and ensures that there is only one copy of data maintained with appropriate access control applied down to the column level.

The code for this solution is available on Github at GoogleCloudPlatform/bq-pii-classifier. Please note that while Google does not maintain this code, you can reach out to your Sales Representative to get in contact with our Professional Services team for guidance on how to implement it.

BigQuery and Data Catalog Policy Tags (now Dataplex) have some limitations that you should be aware of before implementing this solution to ensure that it will work for your organization:

  • Taxonomies and Policy Tags are not shared across regions: If you have data in multiple regions you will need to create or replicate your taxonomy in each region that you want to apply policy tags.
  • Maximum number of 40 taxonomies per project: If you require different taxonomies for different business domains or have replications to support multiple Cloud regions those will count against this quota.
  • Maximum number of 100 policy tags per taxonomy: Cloud DLP supports up to 150 infoTypes for classification, however, a single policy taxonomy can only support up to 100 including any nested categories. If you need to support more than 100 data types, you may need to split these across more than one taxonomy.
Read More  Pro Tools For Pros: Industry Leading Observability Capabilities For Dataflow

High-level overview of the solution

Figure 2: High level architecture of solution

 

The solution is composed mainly of the following components: Dispatcher Requests topic, Dispatcher service, BigQuery Policy Tagger Requests topic, and BigQuery Policy Tagger service and logging components.

The Dispatcher Service is a Cloud Run service that expects a BigQuery scope to be expressed as inclusion and exclusion lists of projects, datasets, and tables. This Dispatcher service will query Automatic DLP Data Profiles to check if the tables in-scope have data profiles generated. For these tables, it will publish one request per table to the “BigQuery Policy Tagger Requests” PubSub topic. This topic enables rate limiting of BigQuery column tagging operations and apply auto-retries with backoffs.

The “BigQuery Policy Tagger” Service is also a Cloud Run service that receives the information of the DLP scan results of a BigQuery table. This service will determine the final InfoType of each column and apply the appropriate Policy Tags as defined in the InfoType – Policy Tags mapping. Only-one INFO_TYPE is selected and the function assigns the associated policy tag.

Lastly, all Cloud Run services maintain structured logs that are exported by a log sink to BigQuery. There are multiple BigQuery views that help with monitoring and debugging Cloud Run call chains and tagging actions on columns.

Deployment options

After deploying the solution, it can be used in two different ways:

[Option 1] Automatic DLP-triggered immediate tagging:

Figure 3: Deployment option 1 – Automatic DLP triggered tagging and inspection

 

Automatic DLP is configured to send a Pub/Sub notification on each inspection job completion. The Pub/Sub notification includes the resource name and it triggers the Tagger service directly.

Read More  Twilio Welcomes Deloitte Digital As A Premier Global Systems Integrator Focused On Elevating Human Experience And Accelerating Digital Transformation

[Option 2] Scheduled tagging:

Figure 4: Deployment option 2 – Scheduled tagging and inspection

 

In this scenario, the Dispatcher service is invoked on a schedule with a payload representing a BigQuery scope to list inspected tables by Automatic DLP and create a tagging request per table. You could use Cloud Scheduler (or any Orchestration tool) to invoke the Dispatcher service. If the solution is deployed within a VPC-SC perimeter, other schedulers that support VPC-SC should be used (such as Composer or Custom App.)

In addition, more than one Cloud Scheduler/Trigger could be defined to group projects/datasets/tables that have the same tagging schedule (such as daily or monthly.)

To learn more about Automatic DLP, see our webpage. To learn more about the BQ classifier, see the open source project on Github: GoogleCloudPlatform/bq-pii-classifier and get started today!

 

 

By: Karim Wadie (Strategic Cloud Engineer) and Nikhiel Sawhney (Head of EMEA Security Practice)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster_cloud

Related Topics
  • Certifications
  • Education
  • Google Cloud
  • Training
You May Also Like
Coffee | Laptop | Notebook | Work
View Post
  • People

First HP Work Relationship Index Shows Majority of People Worldwide Have an Unhealthy Relationship with Work

  • September 20, 2023
japanese-zen-garden-jennifer-goolsby-d9hhl8JXySg-unsplash
View Post
  • Featured
  • People

Overcome Laziness With These 7 Japanese Productivity Hacks

  • August 4, 2023
View Post
  • People
  • Technology

Twitter Is Now X. Here’s What That Means.

  • July 25, 2023
bench-aaron-burden-2bg1jPty490-unsplash
View Post
  • People

The 7 Types of Rest You Need to Recharge

  • July 21, 2023
View Post
  • Engineering
  • People
  • Technology

Oppenheimer: July 28 Panel Discussion Focuses On The Man Behind The Movie

  • July 19, 2023
Man walking on pedestrian by Ryoji Iwata
View Post
  • People

Behind The AI Revolution – Then, Now, And The Future – Are People.

  • July 17, 2023
israel-palestine-conflict-20230705
View Post
  • People

Empire’s Echoes. Unravelling The Gordian Knot Of The Israeli-Palestinian Conflict.

  • July 5, 2023
strikes-protest-communists_wins_strikes_against_police_1e5e4a54-aa2b-43ba-8b88-fc71c1a65387
View Post
  • People

Balancing Act. Understanding The Pros And Cons Of Persistent Industrial Action.

  • July 5, 2023

Stay Connected!
LATEST
  • 1
    Nvidia H100 Tensor Core GPUs Come To Oracle Cloud
    • September 24, 2023
  • 2
    Combining AI With A Trusted Data Approach On IBM Power To Fuel Business Outcomes
    • September 21, 2023
  • 3
    Start Your Ubuntu Confidential VM With Intel® TDX On Google Cloud
    • September 20, 2023
  • Microsoft and Adobe 4
    Microsoft And Adobe Partner To Deliver Cost Savings And Business Benefits
    • September 20, 2023
  • Coffee | Laptop | Notebook | Work 5
    First HP Work Relationship Index Shows Majority of People Worldwide Have an Unhealthy Relationship with Work
    • September 20, 2023
  • 6
    Oracle Expands Distributed Cloud Offerings to Help Organizations Innovate Anywhere
    • September 20, 2023
  • 7
    Huawei Connect 2023: Accelerating Intelligence For Shared Success
    • September 20, 2023
  • 8
    Huawei Releases Data Center 2030, Leading Innovation and Development of New Data Centers
    • September 20, 2023
  • Penguin 9
    How To Find And Fix Broken Packages On Linux
    • September 19, 2023
  • Volkswagen 10
    Volkswagen Races Toward Next-Gen Automotive Manufacturing Leadership With Google Cloud And T-Systems
    • September 19, 2023
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    VMware Scales Multi-Cloud Security With Workforce Identity Federation
    • September 18, 2023
  • Intel Innovation 2
    Intel Innovation 2023
    • September 15, 2023
  • 3
    Microsoft And Oracle Expand Partnership To Deliver Oracle Database Services On Oracle Cloud Infrastructure In Microsoft Azure
    • September 14, 2023
  • 4
    Real-Time Ubuntu Is Now Available In AWS Marketplace
    • September 12, 2023
  • 5
    IBM Brings Watsonx To ESPN Fantasy Football With New Waiver Grades And Trade Grades
    • September 13, 2023
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.