aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
  • Tools
  • About
  • DevOps
  • Platforms

Tools For Debugging Apps On Google Kubernetes Engine

  • aster_cloud
  • June 3, 2020
  • 4 minute read

Running containerized apps on Google Kubernetes Engine (GKE) is a way for a DevOps team to focus on developing apps, rather than on the operational tasks required to run a secure, scalable and highly available Kubernetes cluster. Cloud Logging and Cloud Monitoring are two of several services integrated into GKE that provide DevOps teams with better observability into applications and systems, for easier troubleshooting in the event of a problem.

Using Cloud Logging

Let’s look at a simple, yet common use case. As a member of the DevOps team, you have received an alert from Cloud Monitoring about an application error in your production Kubernetes cluster. You need to diagnose this error. To use a concrete example, we will work through a scenario based on a sample microservices demo app deployed to a GKE cluster. In this demo app, there are many microservices and dependencies among them.


Partner with aster.cloud
for your next big idea.
Let us know here.


cyberpogo
1 using cloud logging.jpg

For this example, consider the demo app running in your staging environment shared by multiple teams or a production environment running multiple workloads. Let’s see how you can work through troubleshooting a simple error scenario.

Let’s start this example from an alert triggered by a large number of HTTP 500 errors. You can create a logs-based metric based on the number of log events or the content of the log entries which you can also use for alerting purposes. Cloud Monitoring provides Alerting which can be set-up to send emails, SMS or generate notifications in third-party apps.

In our example, let’s say there are HTTP 500 errors with the following stack trace.

2 HTTP 500 errors.jpg

If you have already created the alerting policy in Cloud Monitoring, you will receive notifications like the following one:

Read More  Building Cloud Into Your Data Strategy Delivers Higher Efficiency
3 Cloud Monitoring.jpg

You can view the incident details by clicking the ‘VIEW INCIDENT’ link. Following the Policy link from the alert notification opens the alerting section of the Monitoring UI.

4 error rate.jpg

One of the first places that you can look for information on the errors is the Kubernetes Engine section of the Monitoring console. Using the workload view, you can select your cluster and easily see the usage resources for the pods and containers running in the cluster. In this case, you can see that the pod and container for the recommendationservice have very high CPU utilization. This could mean that the recommendationservice is overloaded and not able to respond to requests from the frontend. Ideally, you also have an alert set up for the CPU and memory utilization for the container, which would also generate alerts.

5 workload views.jpg

Opening the link to the server container under the recommendationservice service/pod displays the details about the container. The details include metrics like memory and CPU, logs and details about the container. You can also click the MANAGE link to navigate directly to the pod details in the GKE console.

Because Monitoring is integrated into the GKE console, you can view monitoring graphs for the pod. Using the CPU graph, you can see that the CPU is regularly exceeding the requested amount of CPU. Notice the purple line crossing the solid blue line in the lower left graph. You can also easily see that the memory and disk space are not highly utilized, eliminating them from a list of possible issues. In this case, the CPU could be the issue.

Read More  Announcing The Climate Innovation Challenge—Grants To Support Cutting-Edge Earth Research
6 gke monitoring.jpg

Clicking on the container, you can see the requested CPU, memory, and the deployment details.

7 resources.jpg

You can also click on the Revision history link to review the history of the container. You can see that there was a recent deployment.

8 revision .jpg

It’s worth looking at the logs to see if there is any information about why additional CPU power is suddenly in demand. Since the original error was a 500 error served through the frontend pod, you can navigate to the frontend entry under Workloads. To view the frontend logs, click on the Container logs link. This opens the Cloud Logging UI with a specific pre-constructed filter for the logs of this container.

9 historgram.jpg

In the Logs Viewer, you can see the detailed query, a histogram of the logs, and the individual log entries. The histogram feature provides context for how often log entries are observed over the given time window and can be a powerful tool to help identify application issues. In this case, you can see that the error entries started increasing at around 4:50PM.

By expanding the error entries, you can see the log message below.

“failed to get product recommendations: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.55.247.125:8080: connect: connection refused"

This matches the original HTTP 500 error served through the frontend pod. Now, take a look at the recommendationservice pod logs by adjusting the logging filter to surface error entries with the recommendations name. The filter below restricts the entries to errors from the containers in the pod with a prefix of “recommendations”.

Read More  Women’s History Month: Celebrating The Success Of Women Founders: Schoolio
resource.type="k8s_container"
resource.labels.project_id="YOUR_PROJECT"
resource.labels.location="us-east4-a"
resource.labels.cluster_name="shop-cluster"
resource.labels.namespace_name="default"
resource.labels.pod_name:"recommendations"
severity=ERROR

Now, adjust the filter to look at the non-error log entries.

resource.type="k8s_container"
resource.labels.project_id="YOUR_PROJECT"
resource.labels.location="us-east4-a"
resource.labels.cluster_name="shop-cluster"
resource.labels.namespace_name="default"
resource.labels.pod_name:"recommendations"
severity=!ERROR

You can see in the logs histogram that there are log entries being generated from the service, which likely means that the service is still receiving and responding to some requests.

10 log historgram.jpg

Since no errors were generated by the recommendationservice in the logs, this helps to confirm the suspicion that there is an issue with the latest code deployment causing it to use more CPU than before. With this information, you can take action. You could either increase the CPU request in the container YAML or roll back the recent update to the recommendationservice and contact the developer responsible for the service to review the increase in CPU utilization. The specific action depends on your understanding of the code and recent deployments, your organization and policies. Whichever option you take, you can continue monitoring your cluster for adverse events using Cloud Logging and Monitoring.

Learn more about Cloud Logging, Monitoring and GKE

We built our logging and monitoring capabilities for GKE into Cloud Operations to make it easy for you to monitor, alert and analyze your apps. If you haven’t already, get started with Cloud Logging on GKE and join the discussion on our mailing list. As always, we welcome your feedback.

Charles Baer
Product Manager, Google Cloud
Xiang Shen
Solutions Architect
Source: Google Cloud Blog

Our humans need coffee too! Your support is highly appreciated, thank you!

aster_cloud

Related Topics
  • Cloud Logging
  • Cloud Monitoring
  • Google Cloud
  • Google Kubernetes Engine
  • Kubernetes Engine
You May Also Like
View Post
  • DevOps
  • People

What’s The Future Of DevOps? You Tell Us. Take The 2023 Accelerate State Of DevOps Survey

  • June 2, 2023
View Post
  • Architecture
  • Platforms
  • Software
  • Solutions
  • Technology

What To Expect From Apple’s WWDC 2023

  • June 1, 2023
View Post
  • Platforms

Build Next-Generation, AI-Powered Applications On Microsoft Azure

  • May 26, 2023
View Post
  • Platforms
  • Solutions

MongoDB And Alibaba Cloud Extend Global Partnership

  • May 25, 2023
View Post
  • Computing
  • Platforms

Oracle Cloud Infrastructure Adds To Growing List Of Government Approved Cloud Services

  • May 22, 2023
View Post
  • Engineering
  • People
  • Platforms

KubeCon Europe 2023 Highlights Kubernetes Explosion And Need For Instant Platform Engineering

  • May 17, 2023
View Post
  • Data
  • Platforms
  • Technology

Cloudflare’s R2 Is The Infrastructure Powering Leading AI Companies

  • May 16, 2023
View Post
  • Platforms

Oracle Opens Cloud Region In Serbia

  • May 12, 2023

Stay Connected!
LATEST
  • 1
    Building A Kubernetes Platform: How And Why To Apply Governance And Policy
    • June 4, 2023
  • 2
    Leave, This “United” “Kingdom”, This “Great” “Britain”
    • June 4, 2023
  • 3
    Amazing Federated Multicloud Apps
    • June 2, 2023
  • 4
    What’s The Future Of DevOps? You Tell Us. Take The 2023 Accelerate State Of DevOps Survey
    • June 2, 2023
  • 5
    Resolving Deployment Issues With Ts-node And Azure Development Pipelines
    • June 1, 2023
  • 6
    What To Expect From Apple’s WWDC 2023
    • June 1, 2023
  • 7
    What Is Platform Engineering And Why Adopt It In Your Company?
    • June 1, 2023
  • 8
    Four Steps To Managing Your Cloud Logging Costs On A Budget
    • May 31, 2023
  • 9
    Red Hat Puts Podman Container Management On The Desktop
    • May 30, 2023
  • 10
    The Agile Mindset: A Path to Personal Fulfillment and Growth
    • May 30, 2023
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Huawei ICT Competition 2022-2023 Global Final Held In Shenzhen — 146 Teams From 36 Countries And Regions Win Awards
    • May 27, 2023
  • 2
    Huawei OceanStor Pacific Scale-Out Storage Tops IO500 Rankings
    • May 26, 2023
  • 3
    MongoDB And Alibaba Cloud Extend Global Partnership
    • May 25, 2023
  • 4
    Tricentis Launches Quality Engineering Community ShiftSync
    • May 23, 2023
  • 5
    G7 2023: The Real Threat To The World Order Is Hypocrisy.
    • May 27, 2023
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.