aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
  • DevOps
  • Engineering

Add Severity Levels To Your Alert Policies In Cloud Monitoring

  • aster_cloud
  • April 10, 2022
  • 5 minute read

When you are dealing with a situation that fires a bevy of alerts, do you instinctively know which alerts are the most pressing? Severity levels are an important concept in alerting to aid you and your team in properly assessing which notifications should be prioritized. You can use these levels to focus on the issues deemed most critical for your operations and triage through the noise. Today, we’re happy to announce that you can create custom severity levels on your alert policies and have this data included in your notifications for more effective alerting and integration with downstream third-party services (e.g. Webhook, Cloud Pub/Sub, PagerDuty).

The notification channels have been enhanced to accept this data – including Email, Webhooks, Cloud Pub/Sub, and PagerDuty – with planned support for Slack at a later time. This enables further automation/customization based on importance wherever the notifications are consumed.


Partner with aster.cloud
for your next big idea.
Let us know here.


cyberpogo

Below, we’ll walk through examples of how to add static and dynamic severity levels to an Alert Policy.

Create user labels to support static severity levels

When you add user labels on an alert policy, they will appear on every notification and incident generated by that alert policy. Refer to the documentation to see how to add user labels to alert policies via the Alert Policy API.

Let’s walk through an example: suppose you want to configure Alert Policies that notify you when the CPU utilization crosses a particular threshold. Further, you want the notifications to indicate the following severity levels:

  • info when CPU utilization is between 70% and 80%
  • warning when CPU utilization is between 80% and 90%
  • critical when CPU utilization is above 90%

To accomplish this, you can create three separate alert policies with user labels defined as below:

Create alert policy (A) which triggers when the CPU utilization is above 90%, and includes the following user labels: any incident generated by this policy will include a label severity with value critical.

"userLabels": {
“severity”: “critical”,
}

Create a second policy (B) which triggers when resource CPU utilization is above 80%, and includes the following user labels: any incident generated on this policy will include a label severity with value warning.

"userLabels": {
“severity”: “warning”,
}

Create a third policy (C) which triggers when resource CPU utilization is above 70%, and includes the following user labels: any incident generated on this policy will include a label severity with value info.

"userLabels": {
“severity”: “info”,
}

In this scenario, when the CPU utilization crosses a threshold of 90% policies A, B, and C will trigger alerts. If the CPU utilization falls back down to 85%, the incident from policy A will close, but the incidents from policies B and C will remain open. If the CPU utilization falls even further down to 75%, the incident from policy B will close, and the incident from policy C will remain open. If the CPU utilization drops down to 40%, incidents generated by all three policies will automatically close.

Read More  Change Streams For Cloud Spanner: Now Generally Available

Use MQL to create dynamic severity levels

Alert policy user labels are static in nature, meaning you cannot dynamically apply user labels based on a changing threshold. As shown earlier, you need to create three separate alert policies to generate notifications that contain user label severity with value:

  • info below a threshold of 80%,
  • warning below a threshold of 90%, and
  • critical above a threshold of 90%.

If you’d like to dynamically apply the severity level based on threshold within a single alert policy, you can use MQL. You can utilize MQL to create alert policies with dynamic custom metric labels that will be embedded in the incident. Via MQL map, you can specify what threshold level should result in which severity label. This means you can accomplish the above scenario of three severity levels based on threshold by creating only one alert policy.

Take the sample MQL query below:

 

fetch gce_instance
| metric 'compute.googleapis.com/instance/cpu/utilization'
| filter (metadata.user_labels.env == 'prod') && (resource.zone =~ 'asia.*')
| group_by sliding(5m), [value_utilization_mean: mean(value.utilization)]
| map
   add[
     severity:
       if(val() > 90 '%', 'critical',
         if(val() >= 80 '%' && val() <= 90 '%', 'warning', 'info'))]
| condition val() > 70 '%'

 

In this example, an incident will be created any time CPU utilization is above a threshold of 70%. If the value is between 70-80%, the incident will contain a metric label called severity with value info. If the value is between 80-90%, the metric label severity will have value WARNING, and if the value is above 90%, the label severity will have value critical.

In the above scenario, if the CPU utilization value starts at 92%, incident A will be created with severity level critical. If the utilization value then drops down to 73%, a new incident B will be opened with severity level info. Incident A, however, will remain open. If the value jumps to 82%, a new incident C will open with severity level warning and incidents A and B will remain open. If auto-close is configured in your policy with a duration of 30 minutes, incident `A` will auto-close 30 minutes after incident `B` starts, and incident `B` will auto-close 30 minutes after incident `C` starts.  If the value drops below 70%, all incidents will close.

Read More  Automating Income Taxes With Document AI

In order to ensure the alert policy only has one incident open at a time with the correct corresponding label, and to avoid waiting for incidents to auto-close as in the example above, set evaluationMissingData to EVALUATION_MISSING_DATA_INACTIVE in your API request. This field tells the Alert Policy how to handle situations when the metric stream has sparse or missing data, so the incident can be closed appropriately as needed. If you are making your MQL alert policy in the UI, select the Missing data points treated as values that do not violate the policy condition button in the Advanced Options dropdown in the Configure Trigger section:

 

When EVALUATION_MISSING_DATA_INACTIVE is specified in the above scenario, incident A will close once incidentB is created, and incident B will close once incident C is created.

Severity Labels in Notification Channels

If you send notifications to a third-party service like PagerDuty, Webhooks, or Pub/Sub then you can parse the JSON payload and route the notification according to its severity so that critical information is not missed by your team.

If you utilize alert policy user labels, these will appear as an object on the notification with the key policy_user_labels i.e.:

 

"policy_user_labels": {
    "severity": "critical",
}

 

If you utilize metric labels via MQL, these will appear as an object with key labels nested in an object with key metric i.e.:

 

"metric": {
    "displayName": "Some Display Name",
    "labels": {
      "instance_name": "some_instance_name",
      "severity": "critical"
    },
  }

 

Get Started Today

Alerts can be configured on nearly any metric, log, or trace (or the absence of that data) that is captured in Google Cloud’s operations suite. Severity levels give you and your teams an additional way to cut through noise to find the issues that you know will have the most positive impact when resolved. Check out this video on log alerts as part of our Observability in-depth video series and if you have questions, feature requests, or just want to read topics from other customers who are using Cloud Alerting, visit our Google Cloud Community site.

Read More  How Wayfair Says Yes With BigQuery—Without Breaking The Bank

 

By: Alizah Lalani (Software Engineer)
Source: Google Cloud Blog


Our humans need coffee too! Your support is highly appreciated, thank you!

aster_cloud

Related Topics
  • Alert
  • Cloud Monitoring
  • devops
  • Google Cloud
  • Monitoring
  • Tutorial
You May Also Like
View Post
  • Architecture
  • Data
  • Engineering
  • People
  • Programming
  • Software Engineering
  • Technology

Predictions: Top 25 Careers Likely In High Demand In The Future

  • June 6, 2023
View Post
  • Design
  • Engineering

Building A Kubernetes Platform: How And Why To Apply Governance And Policy

  • June 4, 2023
View Post
  • Computing
  • Design
  • Engineering
  • Multi-Cloud

Amazing Federated Multicloud Apps

  • June 2, 2023
View Post
  • DevOps
  • People

What’s The Future Of DevOps? You Tell Us. Take The 2023 Accelerate State Of DevOps Survey

  • June 2, 2023
View Post
  • Engineering

Resolving Deployment Issues With Ts-node And Azure Development Pipelines

  • June 1, 2023
View Post
  • Design
  • Engineering

Four Steps To Managing Your Cloud Logging Costs On A Budget

  • May 31, 2023
View Post
  • Engineering
  • Tools

Red Hat Puts Podman Container Management On The Desktop

  • May 30, 2023
View Post
  • Engineering
  • Practices
  • Tools

Tricentis Launches Quality Engineering Community ShiftSync

  • May 23, 2023

Stay Connected!
LATEST
  • 1
    Microsoft Offers Azure ML Data Import CLI, SDK For Snowflake, Other Databases
    • June 8, 2023
  • 2
    Why Are Humans Afraid Of AI?
    • June 7, 2023
  • 3
    The Technical Architecture And Components Of A.I. Systems
    • June 7, 2023
  • 4
    Nature Already Inspired A.I. Than Most Realise
    • June 7, 2023
  • 5
    “A Field Guide To AI: For Business, Institutions, Society & Political Economy” — Your Essential Companion In Navigating the World of Artificial Intelligence.
    • June 7, 2023
  • 6
    Predictions: Top 25 Careers Likely In High Demand In The Future
    • June 6, 2023
  • 7
    A S.W.O.T. Analysis Of Current A.I. Systems
    • June 6, 2023
  • Apple-WWCD23-Vision-Pro-glass-230605 8
    Introducing Apple Vision Pro: Apple’s first spatial compute
    • June 5, 2023
  • 9
    Apple Unveils New Mac Studio And Brings Apple Silicon To Mac Pro
    • June 5, 2023
  • 10
    Apple Introduces M2 Ultra
    • June 5, 2023
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    tvOS 17 Brings FaceTime And Video Conferencing To The Biggest Screen In The Home
    • June 5, 2023
  • 2
    Apple Introduces The 15‑Inch MacBook Air
    • June 5, 2023
  • 3
    Huawei ICT Competition 2022-2023 Global Final Held In Shenzhen — 146 Teams From 36 Countries And Regions Win Awards
    • May 27, 2023
  • 4
    Building A Kubernetes Platform: How And Why To Apply Governance And Policy
    • June 4, 2023
  • 5
    Leave, This “United” “Kingdom”, This “Great” “Britain”
    • June 4, 2023
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.