aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Programming
  • Software Engineering

Improving Workload Alerts With The VMware Reliability Scanner

  • aster.cloud
  • March 12, 2021
  • 4 minute read

With the recent release of the VMware Customer Reliability Engineering (CRE) team’s Reliability Scanner, we wanted to take some time to expand on the namespace label check, including how labelling can be used for alert routing.

The Reliability Scanner is a Sonobuoy plugin that allows an end user to include and configure a suggestive set of checks to be executed against a cluster. We can set a check’s configuration to look for a specific label key on a namespace (which will fail if the key does not exist). There is also an option to pull through namespace-configured labels using the `include_labels` configuration option. Doing so will ensure this detail is included in the scans report.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

With the ability to run many different applications, delivered by various teams within a single cluster, new challenges arise in this model of shared responsibility. Having a namespace owner present can help identify the teams responsible for components or services within a cluster and provide them with timely signals.

Below is an example of a namespace label check configured to look for an owner key.

In this post, we’ll take a look at installing VMware Tanzu Kubernetes Grid extensions for monitoring into our existing Tanzu Kubernetes Grid cluster, so that respective namespace owners can be notified when alerts are generated for components in a particular namespace.

 

Installing Tanzu Kubernetes Grid extensions for monitoring

To make this happen, we will need to first put a few components designed for installing and lifecycling our extensions (and applications) into our workload cluster.

Let’s now create some namespaces and deploy our example application to each of them.

Read More  Why 2021 Will Be The Year Of Low-Code

We can now use CURL to verify that our services are up and running.

Next, we will install our Tanzu Kubernetes Grid monitoring extension into the cluster. We’ll start by creating our namespace and any associated roles.

We now have a namespace and the necessary permissions available, so let’s apply our monitoring configuration to the cluster.

First, we need to configure our monitoring values to include our ingress and a trivial Prometheus alert for our service example. We’ll do this by creating a secret with our monitoring configuration data in the target namespace.

We can now install our extension, which our extension manager will handle.

We can use the kubectl watch flag to follow along and watch the installation of the application into our target namespace.

Let’s send some requests to our applications so we can see our metrics coming through. We’ll use `Hey`, a small HTTP load generator, to generate some load to our applications over a 10-second period.

Let’s take a look at our Prometheus UI to observe our metrics as they come through.

We can now update our Prometheus extension to include an alert configuration. The below configuration will create an alerting rule named “example-alerts”. An alert will be created in Prometheus when there is an increase in error rates.

Let’s watch the extension manager pick up on this and trigger a new application deployment.

If we navigate to Status > Rules, we can see our example alerting rule listed.

Based on our defined example alerting rule deployed in the previous step, we are stating that for each namespace, if our `example_operations_failed_count` counter exceeds three failed operations, trigger an alert.

Read More  Announcing Glance: Tiles For Wear OS Made simple

Let’s break team-b’s application and send some requests so we can watch that happen.

We can see, by way of the Prometheus UI, that the number of failed operations has increased to four based on the counter coming from the team-b namespace.

And if we query for `ALERTS`, we will see our team-b namespace alert firing.

Of course, we would not want to use our example alert in a typical scenario, as our counter would continue to grow endlessly. Rather, in most cases, we should use a recording rule in order to define and observe a particular rate sampled at a particular interval.

Now imagine a cluster operator receives one of these alerts for an application’s failed operations. While it’s great information to have, our cluster operator may not have an in-depth understanding of the failing application, and in that case would have to route the alert to the appropriate team for resolution.

Let’s save the cluster operator the unnecessary stress and label our team-a namespace with the appropriate owner details. We can use this metadata later, to create a mapping between any alerts that arise from a particular namespace and route them accordingly.

If we wait the 1-minute interval for the next scrape of our default configured `kube-state-metrics` job, we will see our namespace annotation come through as a label for the metric. We can also see that our alert now has the owner label defined.

Enriching our alert with metadata can be extremely useful, as we are now able to route signals based on priority and/or ownership to the correct team. For example, a cluster operator many find such tailored information useful. That said, depending on their role, they may want to be made aware of failures across a broader set of services within a cluster.

Read More  VMware Tanzu Mission Control Expands Multi-Cloud Data Protection Capabilities

What’s next?

Having the ability to route specific alerts can assist teams managing clusters with getting the right signal to the appropriate owner and reduce alert fatigue for the wider audience. Stay tuned for the next blog post in this series, in which we will walk through how to configure Alertmanager for alert routing.

VMware CRE is a team of Site Reliability Engineers and Program Managers who work together with Tanzu customers and partner teams to learn and apply reliability engineering practices using our Tanzu portfolio of services. As part of our product engineering organization, VMware CRE is responsible for some reliability engineering-related features for Tanzu. We are also in the escalation path for our technical support teams to help our customers meet their reliability goals. 

Alexandra McCoy, Peri Thompson, and Peter Grant contributed to this post.
Source VMware Tanzu Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Reliability Scanner
  • Tanzu
  • VMware
  • VMware CRE
  • VMware Tanzu Kubernetes Grid
You May Also Like
View Post
  • Software Engineering
  • Technology

Claude 3.7 Sonnet and Claude Code

  • February 25, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Software Engineering

5 Books Every Beginner Programmer Should Read

  • July 25, 2024
Ruby
View Post
  • Software Engineering

How To Get Started With A Ruby On Rails Project – A Developer’s Guide

  • January 27, 2024
View Post
  • Engineering
  • Software Engineering

5 Ways Platform Engineers Can Help Developers Create Winning APIs

  • January 25, 2024
Clouds
View Post
  • Cloud-Native
  • Platforms
  • Software Engineering

Microsoft Releases Azure Migrate Assessment Tool For .NET Application

  • January 14, 2024
View Post
  • Software Engineering
  • Technology

It’s Time For Developers And Enterprises To Build With Gemini Pro

  • December 21, 2023

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.