aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Public Cloud
  • Solutions

How Spam Detection Taught Us Better Tech Support

  • aster.cloud
  • November 15, 2021
  • 5 minute read

Information Technology teams, especially in help desk and support, need a way to track what problems people are having. Ideally they also can know how those problems change over time, especially when technology or policy shifts.

Imagine you are in charge of sending a newspaper delivery team to different neighborhoods. Each person has a bicycle, so you give them a route, and they leave the papers at the right doors. But the roads change. Every day they change. It’s chaos.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

What do you do when routes are changing constantly?  How do you provide the information needed when the context is shifting all the time?

In an IT context we run into similar challenges with traditional problem management frameworks such as ITIL 4, which tend to always assume a fixed, well-defined catalog of services. That way every issue that the IT folks solve is tracked and accounted for. That connection back to the catalog allows insight into what’s causing issues, or where outages or incidents may be impacting a large group of employees.

At Google we don’t have that. In part because we focus on putting the user first, and so we focus on getting people back to a productive state as job #1. Also because products, services and issues are always shifting–just like the roads, the route is never the same, even if the goal remains consistent. That means users come into our IT service desk with new problem types everyday.

Our tech support team, called Techstop, acts as the one-stop shop for all IT issues, and supports people across chat, email and video channels. They need to remain adaptable to new problems Googlers experience and new products they use. In order to track what problems might be on the rise, the Techstop team needs a way to catalog what tools, applications and services are in use at Google.

Thinking back to the newspaper delivery routes, we used a rough approximate map, rather than a very detailed one, giving us a taxonomy of services that was “good enough” for most of our use cases. We got some useful data out of it, but it didn’t give us very granular insight.

Read More  LearningMate Partners With Google Cloud To Bring Student Success Services To More Learners

Need for innovation

Covid-19 put a new focus on scalable problem understanding, specifically for everyday employee IT issues. With so much of the workforce moved to a work-from-home model, we really needed to know where employees were experiencing technology pain. It’s as if whole new neighborhoods popped into existence overnight, but our newspaper delivery crew was the same. More ground to cover, with totally novel street maps.

Furthermore, products used everyday for productivity, such as Google Meet, began to see exponential growth in usage, causing scaling issues and outages. These product teams looked to the Techstop organization to help them prioritize the ever increasing list of feature requests and bugs being filled every day.

Ultimately the “good enough” problem taxonomy failed to produce truly helpful insights. We could find out which products were being affected the most, but not what issues people were having with those products. Even worse, new issues that were unique to the work-from-home model were being hidden by the fact that the catalog could not update in time to catch the rapidly changing problem space underneath it.

Borrowing spam tech

Taking a look around other efforts at Google, the Techstop team found examples of solving a similar problem: detecting new patterns quickly in rapidly changing data.

Gmail handles spam filtering for over a billion people. Those engineers had thought through “how do we detect a new spam campaign quickly?”  Spammers rapidly send bulk messages with slight variations in content (noise, misspellings, etc.) Most classification attempts would become a game of cat and mouse since it takes classifiers some time to learn about new patterns.

classifier

Invoking a trend identification engine using unsupervised density clustering on unstructured text unlocked the ability for Gmail to detect ephemeral spam campaigns more quickly.

The Techstop problem had a similar shape to it. Issues caused by rapidly changing products caused highly dynamic user journeys for both employees and the IT professionals troubleshooting these issues. The tickets filed — like the spam emails — were similar, with slight differences in spelling and word choice.

Read More  Technology And Media Entities Join Forces To Create Standards Group Aimed At Building Trust In Online Content

Density clustering

In contrast to more rigid approaches, such as centroid-based algorithms like k-means, density-based clustering is better suited to large and highly heterogeneous data sets, which may contain clusters of drastically variant size. This flexibility helps us tackle the task of problem identification across the entire scope of the company, which requires the ability to detect and distinguish small-but-significant perturbations in the presence of large-but-stable patterns.

Our implementation uses ClustOn, an in-house technology with a hybrid approach that incorporates density-based clustering. But a more time-tested algorithm such as DBSCAN — an open-source implementation of which is available via scikit-learn’s clustering module — could be leveraged to similar effect.

Middle of the road solution using ML

Piggy-backing off of what Gmail was able to do using density clustering techniques, the Techstop team built a robust solution to tracking problems in a way that solved the rigid taxonomy problem. With density clustering, the taxonomy buckets are redefined as trending clusters and provide an index of issues happening in real-time within the company. Importantly, these buckets emerge naturally, rather than being defined ahead of time by the Engineering or Tech Support teams.

By using the technology built for billions of email accounts, we knew we could handle the scale of Google’s support requests. And the solutions would be more flexible than a tightly defined taxonomy, without compromising on relevance or granularity.

The team took it one step further by modeling cluster behavior using Poisson regression and implemented anomaly detection measures to alert operations teams in real time about ongoing outages, or poorly executed changes. With a lightweight operations team and this new technology, Techstop was able to find granular insights that would have taken an entire dedicated team to manually comb through and aggregate each incident.

Read More  Vimeo Builds A Fully Responsive Video Platform On Google Cloud

The combination of ML and Operations transformed Techstop data into a valuable reference for product managers and engineering teams looking to understand the issues users face with their products in an enterprise environment.

How it works

To bring it all together, we built a ML pipeline that we call Support Insights, so we could automatically distill summary data from the many interactions and tickets we received. The Support Insights Pipeline combines machine learning, human validation and probabilistic analysis  together in a single systems dynamics approach.

As data moves through this pipeline, they are:

  1. Extracted – Uses the BigQuery API to store and extract, train and load support data. To ingest the 1M+ amount of IT related support data.
  2. Processed Part-of-speech tagging, PII Redaction and TF-IDF transformations to model support data for our clustering algorithms
  3. Clustered Centroid-based clustering runs in timed batches with persistent snaphotting of previous run states to maintain cluster ids and track behavior of clusters over time.
  4. Scored Uses Poisson Regression to model both long-term and short-term behavior of cluster trends and calculates the difference between the two to measure deviation. This deviation score is used to detect anomalous behavior within a trend.
  5. Operationalized Trends with an anomalous score over a certain threshold trigger an IssueTracker API bug. This bug is then picked up by operations teams for relevant deep dive and incident tracking.
  6. Resampled – Uses statistical methods to estimate proportions of customer user journeys (CUJs) within trends
  7. Categorized/mapped – We work with the Operations teams to map trend proportions to User Journey Segments
how it works

In our next post we’ll detail what technologies and methods we used for these seven steps, and walk through how you could use a similar pipeline yourself. To get started start by loading your data into BigQuery and use BigQuery ML to cluster your support data.

 

 

By Nicholaus Jackson, Efficiency Solutions Engineer | Max Saltonstall Senior Developer Relations Engineer, Google Cloud
Source Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Google Cloud
  • Spam
You May Also Like
View Post
  • Computing
  • Public Cloud
  • Technology

United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services

  • April 15, 2025
DeepSeek R1 is now available on Azure AI Foundry and GitHub
View Post
  • Public Cloud
  • Technology

DeepSeek R1 is now available on Azure AI Foundry and GitHub

  • February 2, 2025
Cloud platforms among the clouds
View Post
  • Computing
  • Learning
  • Public Cloud

Best Cloud Platforms Offering Free Trials for Cloud Mastery

  • December 23, 2024
Vehicle Manufacturing
View Post
  • Hybrid Cloud
  • Public Cloud

Toyota shifts into overdrive: Developing an AI platform for enhanced manufacturing efficiency

  • December 10, 2024
IBM and AWS
View Post
  • Public Cloud

IBM and AWS Accelerate Partnership to Scale Responsible Generative AI

  • December 2, 2024
COP29 AI and Climate Change
View Post
  • Public Cloud
  • Technology

How Cloud And AI Are Bringing Scale To Corporate Climate Mitigation And Adaptation

  • November 18, 2024
Cloud Workstations
View Post
  • Public Cloud

FEDRAMP High Development in the Cloud: Code with Cloud Workstations

  • November 8, 2024
View Post
  • Public Cloud

PyTorch/XLA 2.5: vLLM support and an improved developer experience

  • October 31, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.