aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • DevOps
  • Platforms

Tools For Debugging Apps On Google Kubernetes Engine

  • aster.cloud
  • June 3, 2020
  • 4 minute read

Running containerized apps on Google Kubernetes Engine (GKE) is a way for a DevOps team to focus on developing apps, rather than on the operational tasks required to run a secure, scalable and highly available Kubernetes cluster. Cloud Logging and Cloud Monitoring are two of several services integrated into GKE that provide DevOps teams with better observability into applications and systems, for easier troubleshooting in the event of a problem.

Using Cloud Logging

Let’s look at a simple, yet common use case. As a member of the DevOps team, you have received an alert from Cloud Monitoring about an application error in your production Kubernetes cluster. You need to diagnose this error. To use a concrete example, we will work through a scenario based on a sample microservices demo app deployed to a GKE cluster. In this demo app, there are many microservices and dependencies among them.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

1 using cloud logging.jpg

For this example, consider the demo app running in your staging environment shared by multiple teams or a production environment running multiple workloads. Let’s see how you can work through troubleshooting a simple error scenario.

Let’s start this example from an alert triggered by a large number of HTTP 500 errors. You can create a logs-based metric based on the number of log events or the content of the log entries which you can also use for alerting purposes. Cloud Monitoring provides Alerting which can be set-up to send emails, SMS or generate notifications in third-party apps.

In our example, let’s say there are HTTP 500 errors with the following stack trace.

2 HTTP 500 errors.jpg

If you have already created the alerting policy in Cloud Monitoring, you will receive notifications like the following one:

Read More  The Key Role ‘Visibility’ Plays In Healthcare’s Cybersecurity Resilience
3 Cloud Monitoring.jpg

You can view the incident details by clicking the ‘VIEW INCIDENT’ link. Following the Policy link from the alert notification opens the alerting section of the Monitoring UI.

4 error rate.jpg

One of the first places that you can look for information on the errors is the Kubernetes Engine section of the Monitoring console. Using the workload view, you can select your cluster and easily see the usage resources for the pods and containers running in the cluster. In this case, you can see that the pod and container for the recommendationservice have very high CPU utilization. This could mean that the recommendationservice is overloaded and not able to respond to requests from the frontend. Ideally, you also have an alert set up for the CPU and memory utilization for the container, which would also generate alerts.

5 workload views.jpg

Opening the link to the server container under the recommendationservice service/pod displays the details about the container. The details include metrics like memory and CPU, logs and details about the container. You can also click the MANAGE link to navigate directly to the pod details in the GKE console.

Because Monitoring is integrated into the GKE console, you can view monitoring graphs for the pod. Using the CPU graph, you can see that the CPU is regularly exceeding the requested amount of CPU. Notice the purple line crossing the solid blue line in the lower left graph. You can also easily see that the memory and disk space are not highly utilized, eliminating them from a list of possible issues. In this case, the CPU could be the issue.

Read More  Minimalizing The Mainframe
6 gke monitoring.jpg

Clicking on the container, you can see the requested CPU, memory, and the deployment details.

7 resources.jpg

You can also click on the Revision history link to review the history of the container. You can see that there was a recent deployment.

8 revision .jpg

It’s worth looking at the logs to see if there is any information about why additional CPU power is suddenly in demand. Since the original error was a 500 error served through the frontend pod, you can navigate to the frontend entry under Workloads. To view the frontend logs, click on the Container logs link. This opens the Cloud Logging UI with a specific pre-constructed filter for the logs of this container.

9 historgram.jpg

In the Logs Viewer, you can see the detailed query, a histogram of the logs, and the individual log entries. The histogram feature provides context for how often log entries are observed over the given time window and can be a powerful tool to help identify application issues. In this case, you can see that the error entries started increasing at around 4:50PM.

By expanding the error entries, you can see the log message below.

“failed to get product recommendations: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.55.247.125:8080: connect: connection refused"

This matches the original HTTP 500 error served through the frontend pod. Now, take a look at the recommendationservice pod logs by adjusting the logging filter to surface error entries with the recommendations name. The filter below restricts the entries to errors from the containers in the pod with a prefix of “recommendations”.

Read More  Graph Data Science On Google Cloud: Neo4j Aurads And Vertex AI
resource.type="k8s_container"
resource.labels.project_id="YOUR_PROJECT"
resource.labels.location="us-east4-a"
resource.labels.cluster_name="shop-cluster"
resource.labels.namespace_name="default"
resource.labels.pod_name:"recommendations"
severity=ERROR

Now, adjust the filter to look at the non-error log entries.

resource.type="k8s_container"
resource.labels.project_id="YOUR_PROJECT"
resource.labels.location="us-east4-a"
resource.labels.cluster_name="shop-cluster"
resource.labels.namespace_name="default"
resource.labels.pod_name:"recommendations"
severity=!ERROR

You can see in the logs histogram that there are log entries being generated from the service, which likely means that the service is still receiving and responding to some requests.

10 log historgram.jpg

Since no errors were generated by the recommendationservice in the logs, this helps to confirm the suspicion that there is an issue with the latest code deployment causing it to use more CPU than before. With this information, you can take action. You could either increase the CPU request in the container YAML or roll back the recent update to the recommendationservice and contact the developer responsible for the service to review the increase in CPU utilization. The specific action depends on your understanding of the code and recent deployments, your organization and policies. Whichever option you take, you can continue monitoring your cluster for adverse events using Cloud Logging and Monitoring.

Learn more about Cloud Logging, Monitoring and GKE

We built our logging and monitoring capabilities for GKE into Cloud Operations to make it easy for you to monitor, alert and analyze your apps. If you haven’t already, get started with Cloud Logging on GKE and join the discussion on our mailing list. As always, we welcome your feedback.

Charles Baer
Product Manager, Google Cloud
Xiang Shen
Solutions Architect
Source: Google Cloud Blog

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Cloud Logging
  • Cloud Monitoring
  • Google Cloud
  • Google Kubernetes Engine
  • Kubernetes Engine
You May Also Like
Google Cloud and Smart Communications
View Post
  • Platforms
  • Technology

Smart Communications, Inc. Dials into Google Cloud AI to Help Personalize Digital Services for Filipinos

  • October 25, 2024
View Post
  • Platforms
  • Public Cloud

Empowering builders with the new AWS Asia Pacific (Malaysia) Region

  • August 30, 2024
Red Hat and Globe Telecoms
View Post
  • Platforms
  • Technology

Globe Collaborates with Red Hat Open Innovation Labs to Modernize IT Infrastructure for Greater Agility and Scalability

  • August 19, 2024
Huawei Cloud Cairo Region Goes Live
View Post
  • Cloud-Native
  • Computing
  • Platforms

Huawei Cloud Goes Live in Egypt

  • May 24, 2024
Asteroid
View Post
  • Computing
  • Platforms
  • Technology

Asteroid Institute And Google Cloud Identify 27,500 New Asteroids, Revolutionizing Minor Planet Discovery With Cloud Technology

  • April 30, 2024
IBM
View Post
  • Hybrid Cloud
  • Platforms

IBM To Acquire HashiCorp, Inc. Creating A Comprehensive End-to-End Hybrid Cloud Platform

  • April 24, 2024
View Post
  • Platforms
  • Technology

Canonical Delivers Secure, Compliant Cloud Solutions for Google Distributed Cloud

  • April 9, 2024
Redis logo
View Post
  • Platforms
  • Software

Redis Moves To Source-Available Licenses

  • April 2, 2024

Stay Connected!
LATEST
  • 1
    Just make it scale: An Aurora DSQL story
    • May 29, 2025
  • 2
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • Examine the 4 types of edge computing, with examples
    • May 28, 2025
  • AI and private cloud: 2 lessons from Dell Tech World 2025
    • May 28, 2025
  • 5
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • Weigh these 6 enterprise advantages of storage as a service
    • May 28, 2025
  • 7
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 8
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 9
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
  • Gemma 3n 10
    Announcing Gemma 3n preview: powerful, efficient, mobile-first AI
    • May 22, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • 2
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • 3
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • oracle-ibm 4
    Google Cloud and Philips Collaborate to Drive Consumer Marketing Innovation and Transform Digital Asset Management with AI
    • May 20, 2025
  • notta-ai-header 5
    Notta vs Fireflies: Which AI Transcription Tool Deserves Your Attention in 2025?
    • May 16, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.