aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • DevOps
  • Engineering

Add Severity Levels To Your Alert Policies In Cloud Monitoring

  • aster.cloud
  • April 10, 2022
  • 5 minute read

When you are dealing with a situation that fires a bevy of alerts, do you instinctively know which alerts are the most pressing? Severity levels are an important concept in alerting to aid you and your team in properly assessing which notifications should be prioritized. You can use these levels to focus on the issues deemed most critical for your operations and triage through the noise. Today, we’re happy to announce that you can create custom severity levels on your alert policies and have this data included in your notifications for more effective alerting and integration with downstream third-party services (e.g. Webhook, Cloud Pub/Sub, PagerDuty).

The notification channels have been enhanced to accept this data – including Email, Webhooks, Cloud Pub/Sub, and PagerDuty – with planned support for Slack at a later time. This enables further automation/customization based on importance wherever the notifications are consumed.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Below, we’ll walk through examples of how to add static and dynamic severity levels to an Alert Policy.

Create user labels to support static severity levels

When you add user labels on an alert policy, they will appear on every notification and incident generated by that alert policy. Refer to the documentation to see how to add user labels to alert policies via the Alert Policy API.

Let’s walk through an example: suppose you want to configure Alert Policies that notify you when the CPU utilization crosses a particular threshold. Further, you want the notifications to indicate the following severity levels:

  • info when CPU utilization is between 70% and 80%
  • warning when CPU utilization is between 80% and 90%
  • critical when CPU utilization is above 90%

To accomplish this, you can create three separate alert policies with user labels defined as below:

Create alert policy (A) which triggers when the CPU utilization is above 90%, and includes the following user labels: any incident generated by this policy will include a label severity with value critical.

"userLabels": {
“severity”: “critical”,
}

Create a second policy (B) which triggers when resource CPU utilization is above 80%, and includes the following user labels: any incident generated on this policy will include a label severity with value warning.

"userLabels": {
“severity”: “warning”,
}

Create a third policy (C) which triggers when resource CPU utilization is above 70%, and includes the following user labels: any incident generated on this policy will include a label severity with value info.

"userLabels": {
“severity”: “info”,
}

In this scenario, when the CPU utilization crosses a threshold of 90% policies A, B, and C will trigger alerts. If the CPU utilization falls back down to 85%, the incident from policy A will close, but the incidents from policies B and C will remain open. If the CPU utilization falls even further down to 75%, the incident from policy B will close, and the incident from policy C will remain open. If the CPU utilization drops down to 40%, incidents generated by all three policies will automatically close.

Read More  Building A Machine Learning Platform With Kubeflow And Ray On Google Kubernetes Engine

Use MQL to create dynamic severity levels

Alert policy user labels are static in nature, meaning you cannot dynamically apply user labels based on a changing threshold. As shown earlier, you need to create three separate alert policies to generate notifications that contain user label severity with value:

  • info below a threshold of 80%,
  • warning below a threshold of 90%, and
  • critical above a threshold of 90%.

If you’d like to dynamically apply the severity level based on threshold within a single alert policy, you can use MQL. You can utilize MQL to create alert policies with dynamic custom metric labels that will be embedded in the incident. Via MQL map, you can specify what threshold level should result in which severity label. This means you can accomplish the above scenario of three severity levels based on threshold by creating only one alert policy.

Take the sample MQL query below:

 

fetch gce_instance
| metric 'compute.googleapis.com/instance/cpu/utilization'
| filter (metadata.user_labels.env == 'prod') && (resource.zone =~ 'asia.*')
| group_by sliding(5m), [value_utilization_mean: mean(value.utilization)]
| map
   add[
     severity:
       if(val() > 90 '%', 'critical',
         if(val() >= 80 '%' && val() <= 90 '%', 'warning', 'info'))]
| condition val() > 70 '%'

 

In this example, an incident will be created any time CPU utilization is above a threshold of 70%. If the value is between 70-80%, the incident will contain a metric label called severity with value info. If the value is between 80-90%, the metric label severity will have value WARNING, and if the value is above 90%, the label severity will have value critical.

In the above scenario, if the CPU utilization value starts at 92%, incident A will be created with severity level critical. If the utilization value then drops down to 73%, a new incident B will be opened with severity level info. Incident A, however, will remain open. If the value jumps to 82%, a new incident C will open with severity level warning and incidents A and B will remain open. If auto-close is configured in your policy with a duration of 30 minutes, incident `A` will auto-close 30 minutes after incident `B` starts, and incident `B` will auto-close 30 minutes after incident `C` starts.  If the value drops below 70%, all incidents will close.

Read More  Broad Institute Speeds Scientific Research With Cloud SQL

In order to ensure the alert policy only has one incident open at a time with the correct corresponding label, and to avoid waiting for incidents to auto-close as in the example above, set evaluationMissingData to EVALUATION_MISSING_DATA_INACTIVE in your API request. This field tells the Alert Policy how to handle situations when the metric stream has sparse or missing data, so the incident can be closed appropriately as needed. If you are making your MQL alert policy in the UI, select the Missing data points treated as values that do not violate the policy condition button in the Advanced Options dropdown in the Configure Trigger section:

 

When EVALUATION_MISSING_DATA_INACTIVE is specified in the above scenario, incident A will close once incidentB is created, and incident B will close once incident C is created.

Severity Labels in Notification Channels

If you send notifications to a third-party service like PagerDuty, Webhooks, or Pub/Sub then you can parse the JSON payload and route the notification according to its severity so that critical information is not missed by your team.

If you utilize alert policy user labels, these will appear as an object on the notification with the key policy_user_labels i.e.:

 

"policy_user_labels": {
    "severity": "critical",
}

 

If you utilize metric labels via MQL, these will appear as an object with key labels nested in an object with key metric i.e.:

 

"metric": {
    "displayName": "Some Display Name",
    "labels": {
      "instance_name": "some_instance_name",
      "severity": "critical"
    },
  }

 

Get Started Today

Alerts can be configured on nearly any metric, log, or trace (or the absence of that data) that is captured in Google Cloud’s operations suite. Severity levels give you and your teams an additional way to cut through noise to find the issues that you know will have the most positive impact when resolved. Check out this video on log alerts as part of our Observability in-depth video series and if you have questions, feature requests, or just want to read topics from other customers who are using Cloud Alerting, visit our Google Cloud Community site.

Read More  Equinix Metal Expansion Equips Digital Leaders To Harness Physical Infrastructure At Software Speed

 

By: Alizah Lalani (Software Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Alert
  • Cloud Monitoring
  • devops
  • Google Cloud
  • Monitoring
  • Tutorial
You May Also Like
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
View Post
  • Engineering

Transforming the Developer Experience for Every Engineering Role

  • July 14, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.