- as Intent or Form Parameters extracted from end users during the conversation,
- as Session Parameters set by upstream systems calling the Dialogflow CX API, set by Webhooks, or as part of the design of a route, event handler, or form reprompt
- as payload data supplied by Webhooks interacting with downstream services
Ideally, sensitive information should be identified and redacted at source so that it does not propagate into downstream logs, data warehouses, data lakes, analytics, or reporting systems. Below, we describe an approach to redaction used in production by large enterprises deploying Google Contact Center AI (CCAI).
From our partners:
Redacting Intent and Form Parameters
For Intent or Form Parameters, redaction is built-in. Simply select the checkbox “Redact in logs” in the Parameter section within the Intent or Page Parameter settings of the console.
Redacting Session Parameters, Webhook data, and Response Messages
For Session Parameters, Webhook data, and other data logged by Dialogflow CX, including Fulfillment Response Messages, the approach to redact such information relies on Cloud Data Loss Prevention (DLP) inspection templates.
Session Parameters are often used to personalize the conversation with user data from an upstream system. For example, an upstream contact center platform may fetch the user’s profile from a CRM, and pass in the first name, demographic data, and market segment information into Dialogflow CX. A conversation designer may then tailor the Flow design by changing Intent training phrases, Entity synonyms, and responses (e.g. different durations, volume, pitch, or rate of speech) to fit the user’s unique requirements.
Similarly, Webhook data is important in conversation design because it enables rich, dynamic responses to the user supported by backend systems. For example, let’s say a customer is moving to a new apartment, so your Dialogflow CX Virtual Agent asks the user to say their new street address. A Webhook would be used to validate the captured address against an external service like the Google Maps Places API, which may also autocomplete the city, state / province, zip / postal code, and country fields. It’s risky if we capture the wrong address, so the Virtual Agent says full address back to the end user for confirmation.
In both examples above, PII data is stored as one or more Session Parameters and Webhook payloads. Additionally, the Response Messages played back to the user are logged. If we don’t take action to identify and redact this data, it will make its way into Google Cloud Logging (formerly Stackdriver) and any listeners subscribing to the log stream.
Below, we demonstrate how we can configure security settings in Dialogflow CX to use a Cloud Data Loss Prevention Inspection Template to redact sensitive information before it gets into downstream logging systems (i.e. redaction at source). This ensures sensitive information will be unavailable downstream while still allowing the information to be used in the design of the Virtual Agent.
Key Components
Data Loss Prevention (DLP) Inspection Templates
Our solution uses Google Cloud Data Loss Prevention (DLP), which is a service that can identify, mask, obfuscate, de-identify, transform, or tokenize sensitive information in text using NLP- and rules-based methods. To leverage DLP to redact all log data from Dialogflow CX at source, we create configurations (also known as Inspection Templates) that can identify and transform unstructured text information in a document. In our case, the documents are the log messages that contain the Session Parameters, Webhook data, Fulfillment Response Messages and any other interaction data. To identify PII, PCI, PHI, or CI, we can set the configuration to use a pre-trained machine learning model (i.e. built-in infoTypes) or a custom string search (i.e. word lists or regex).
Speech Synthesis Markup Language (SSML)
Our solution uses Speech Synthesis Markup Language (SSML). A brief explanation of SSML is included in the paragraph below:
When working with Text-to-Speech (TTS) systems, it is difficult to know how the system will say the final utterance to a user. This is where SSML is useful. SSML is a WC3 standard that uses XML tags to describe, at various points, how the TTS system must say the phrase. You can change the pitch, pronunciation, speaking rate, and volume among many other properties. For example, if you have a phone number and it is written as “555-6666” then you likely would like it said as “five five five six six six six” instead of “five hundred and fifty five minus six thousand six hundred and sixty six”. You can give these precise instructions to the TTS system adding the following SSML:
<say-as interpret-as=”telephone”>555-6666</say-as>
Contact Center AI (CCAI) Security Settings
CCAI Security Settings allows you to apply a DLP Inspection Template between Dialogflow CX and Google Cloud Logging. The DLP system can then find and redact the sensitive information before being published to Stackdriver.
Solution
The required security settings can be applied in various ways such as through the Google Cloud Console, using Google Cloud API’s, and using Terraform.
Below, we outline two approaches: 1) using the Google Cloud Console and 2) using Terraform.
Important Considerations
The first seemingly obvious, but flawed solution is to use DLP or a similar system to redact sensitive information in the first downstream system that consumes the Dialogflow CX log messages. Perhaps there is a log sink flowing to a Cloud Storage bucket, BigQuery table, Pub/Sub topic, or other destination (e.g. Splunk) where such redaction will occur before any other consumers will have access to the data. In practice, data in Cloud Logging is easily viewable and propagates to other monitoring applications, this increases the surface area for unintentional or intentional privacy breaches by both internal and external parties. As such, please consider this an anti-pattern.
Another important note is that the solution we select should still enable sensitive information, including PII data, to be usable in responses to the end user and should remain compatible with SSML.
Instructions – Google Cloud Console
Now that we understand the requirements and all the components involved, the first step is to return all Session Parameter and Webhook data that is to be redacted with the SSML mark tag shown below. This is configured at the webhook level.
<mark name=”redact-start”/>123 Main Street<mark name=”redact-end”/>
This SSML tag is selected because it is a reserved tag in the SSML WC3 specifications which will not affect speech output by TTS systems. This ensures the data can be used in Response Messages by the Dialogflow CX Agent. Note that the “name” attribute can be anything and should match your convention.
Next, define a string pattern in a DLP inspection template as an infoType that will search for these tags. Below is the configuration with the search tag of “<mark name=”redact-start”/>.*<mark name=”redact-end”/>”.
restapi_object
resource. The restapi_object
Terraform resource will create the DLP Template and apply it to the Dialogflow CX Agent. The below assumes that the Google Cloud provider has already been correctly configured.
First we create the DLP inspection template:
resource "google_data_loss_prevention_inspect_template" "dialogflow-cx-inspection-regex" {
provider = google.global
parent = "projects/${var.project_id}/locations/${var.your_agent_location}"
description = "Redacts data coming from webhooks to dialogflow cx"
display_name = "dialogflow-cx-inspection-regex"
inspect_config {
custom_info_types {
info_type {
name = "DIALOGFLOW_CX_INSPECTION_REGEX"
}
likelihood = "VERY_UNLIKELY"
regex {
pattern = "<mark name=\"dfc-redact-start\"\\/>.*<mark name=\"dfc-redact-end\"\\/>"
}
}
min_likelihood = "VERY_UNLIKELY"
content_options = ["CONTENT_TEXT"]
}
}
restapi_object
which will create it for us. We declare the provider along with the resource configuration.required_providers {
restapi = {
source = "fmontezuma/restapi"
version = "1.14.1"
}
}
}
provider "restapi" {
alias = "security-settings"
uri = "https://northamerica-northeast1-dialogflow.googleapis.com/v3/"
id_attribute = "name"
write_returns_object = true
headers = {
Authorization = "Bearer ${var.google_rest_access_token}"
}
}
resource "restapi_object" "dialogflow-cx-security-settings" {
provider = restapi.security-settings
path = "" # Can be left blank, as the following will be substituted /projects/${var.project_id}/locations/${var.your_agent_location}/securitySettings/<security settings ID>"
create_path = "/projects/${var.project_id}/locations/${var.your_agent_location}/securitySettings"
data = jsonencode({
displayName = "dialogflow-cx-parameters-security"
redactionStrategy = "REDACT_WITH_SERVICE"
redactionScope = "REDACT_DISK_STORAGE"
inspectTemplate = google_data_loss_prevention_inspect_template.dialogflow-cx-inspection-regex.id
purgeDataTypes = "DIALOGFLOW_HISTORY"
})
}
dialogflow-cx-security-settings
to the dialogflow-cx agent
and reference the security settings from aboveresource "google_dialogflow_cx_agent" "my_dialogflow_cx_agent_resource" {
provider = google.your_gcp_provider
project = var.project_id
display_name = "My Agent Name"
location = var.your_agent_location
default_language_code = var.agent_default_language
supported_language_codes = ["en-ca"]
time_zone = "America/New_York"
description = "Your Agent Description"
avatar_uri = "https://cloud.google.com/_static/images/cloud/icons/favicons/onecloud/super_cloud.png"
enable_stackdriver_logging = true
security_settings = restapi_object.dialogflow-cx-security-settings.id
}
Conclusion
In this blog post, we demonstrated how to redact sensitive information via CCAI Security Settings and DLP. Furthermore, we demonstrated how this can be achieved through Google Cloud Console or Terraform. As a Dialogflow CX developer, the above solution makes redaction easy to configure. Remember that before data is applied to a Session Parameter, it should be surrounded by the <mark name=”redact-start”/>
and <mark name=”redact-end”/>
tags. Conversation designers can still interpolate the parameter as expected without affecting the TTS speech output. Furthermore, sensitive information will be redacted from the logs without losing any of the other log data, including other non-sensitive parts of the conversation responses.
Deloitte is a Premier Partner for Contact Center AI
This post was written by Deloitte Canada’s Conversational AI practice and Google Cloud. Deloitte is a Premier Partner of Google Cloud and has been recognized as Google Cloud’s Global Services Partner of the Year for four consecutive years (2017-2020), and the Global Industry Solution Partner of the year in 2021.
Deloitte is a global leader in Contact Center AI (CCAI) strategy, implementation, and operations, bringing end-to-end expertise in strategy, transformation, architecture, design, software engineering, data science, machine learning, analytics, cloud ops, and security. Deloitte is partnered with Google Cloud to deliver complex transformations of your digital channels and service operations with AI and Natural Language Processing (NLP).
Learn More
Want to try out DLP for yourself? Try this tutorial. If you are interested in learning more about the above approach or want to discuss Google Contact Center AI, please reach out to the authors at Deloitte Canada or on LinkedIn.
Special thanks to Miguel Mendez,Conversational AI Architect, Deloittefor contributing to this post.
By: Vijul Patel (Partner Engineer, Google) and Anand Nimkar (Conversational AI Practice Leader, Deloitte)
Source: Google Cloud Blog
For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!
Our humans need coffee too! Your support is highly appreciated, thank you!