aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Design
  • Engineering
  • Hybrid Cloud
  • Practices

Patterns For Better Insights And Troubleshooting With Hybrid Cloud Logs

  • aster.cloud
  • January 20, 2022
  • 5 minute read

Hybrid and multi-cloud environments produce a boundless array of logs including application and server logs, logs related to cloud services, APIs, orchestrators, gateways and just about anything else running in the environment. Due to this high volume, logging systems may become slow and unmanageable when you urgently need them to troubleshoot an issue, and even harder to use them to get insights.

Google Cloud’s operations suite plays a vital role in any application modernization framework and is essential to assuring a reliable and secure application, providing monitoring, logging and alerting — the baseline of SRE and Google’s holistic approach to operations.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

As customer engineers, we see many organizations with hybrid and multi-cloud applications who want to integrate logs and metrics from various sources into a single console. Metrics of all critical services can be collected not just for daily operations but also to measure internal and external SLIs, SLOs and SLAs of modern applications.

Improving customer operations

We recently worked with two customers – a large media-processing customer and a large telecom provider — that have applications running on Google Cloud, other clouds and on-premises. Each customer faced issues with their logs:

  • Customer 1, large media-process company: their existing logging environment was too slow to effectively support real-time troubleshooting using system and application logs from across all their environments, and
  • Customer 2, large telecom provider: they did not feel they were getting valuable insights from the many terabytes of network logs they were ingesting per day. These insights did not need to be real-time.

Both customers used self-managed, popular open source products for ingesting, storage and retrieval. But as the volume of their logs grew, the cost of their infrastructure and operational overhead to support logs rose too. They shared other characteristics:

  • Both setups required SSD storage as well as large VMs for ingestion pipelines, storage and retrieval.
  • Both faced risk due to dependencies on a single resource for elements of their log management.
  • And finally, both had data pipeline queues piling up, which for one of them meant they were not getting the logs when they needed them most, when application/infrastructure failures impacted the SLA of their products and services.
Read More  Now Generally Available: BigQuery BI Engine Supports Any BI Tool Or Custom Application

They needed to reduce costs, reduce failures and determine the volume of logs they need to ingest and store to get analytical insights from them. We looked at different patterns and proposed the following two options:

  • Customer 1: Route system and application logs from their hybrid and multi-cloud services to Cloud Logging for real-time troubleshooting at scale, reducing cost and operational burden.
  • Customer 2: Route their network logs directly to BigQuery, so they could manage costs and get better insights from their data.

Let’s take a deeper look at the two patterns:

Pattern 1: Cloud Logging for resource management and troubleshooting

This pattern fits well in the scenarios where the primary objective is to troubleshoot based on real time logs. Here the main focus is on important logs & metrics, while logs that are not needed for real-time troubleshooting are sampled and filtered as needed. Customer 1 had a large volume of logs, so scale and timely ingestion were critical for troubleshooting problems. To help them meet their objective, we proposed the following pattern:

 

In this pattern, the customer uses Cloud Logging to collect the logs from Google Cloud, other clouds and VMs. Google Cloud resources such as Google Kubernetes Engine (GKE), Compute Engine with the Ops Agent, and Cloud Storage automatically send logs to Cloud Logging, while logging agents such as fluentd and stanza brings in application and system logs from other sources such as other clouds and on-prem systems (Additionally, partner tools such as BlueMedora can be used to bring logs from a wide range of other sources including Azure Kubernetes Service).

Logging agents collect and send the logs using Cloud Logging API to the Logs Router, where you can apply filters to capture only the important logs. Our customer’s hybrid deployment was generating 20 TB of logs every day, so we identified logs which were not important for real-time troubleshooting (in their case detailed network logs and debug logs) and applied filters at both the agent and Logs Router level. Further, we advised them that they could export their network logs to BigQuery or other third party tools for future analysis for deriving patterns and insights.

Read More  Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

This pattern suited the customer’s requirements as their use case was primarily focused on troubleshooting and detailed network logs were only needed for analytics. It had the following advantages

  • Fully managed so they could focus on app development
  • Cost effective with combination of Cloud Logging and BigQuery
  • Provided the basis for future self healing operations with logs-based metrics

Pattern 2: BigQuery for log analytics in hybrid and multi-cloud scenarios

This pattern fits well in Customer 2’s scenario where logs volumes are high and leveraged primarily for analytics. In this customer’s case, they wanted deeper insights into their network logs.

 

This pattern is recommended for customers that need to run models for anomaly detection, pattern recognition, etc., specifically related to their network logs. Network logs tend to be high volume, and since these logs were primarily used for analytics, the customer did not really benefit from capabilities like sorting, filtering, dashboarding, etc. This is where BigQuery was useful, with low costs, built-in AI/ML, and global scale. Google’s fully managed data warehouse and analytical engine performed queries on terabytes of logs in tens of seconds making it ideal for the customer’s advanced analytics needs.

In this pattern, the Log Collector agent uses an output plugin that configures BigQuery as a destination for storing the logs collected from hybrid cloud, other cloud providers and from Google Cloud. Using the plugin, the customer can directly load logs into BigQuery in near-real-time from many servers. BigQuery automatically creates the schema for incoming logs, giving the customer more control over defining the schema and data format for the metrics. Once the logs are in BigQuery, the customer can make use of Looker to perform basic RegEx and build machine learning models on top of it. The customer can also visualize their log data by creating a dashboard that’s updated frequently using Data Studio, Looker or any visualization tool.

Read More  Yugabyte Cloud Offers Database Scale AND Transactional Consistency

This pattern provides the following advantages:

  • A cost-effective solution for high-volume networking logs
  • Built-in analytics engine for log insights
  • Use of the BigQuery API to display data to a dashboard and consume analytical insights

Final Thoughts

Google Cloud’s operations suite is easy to use right out of the box for Google Cloud users, but as we demonstrate for our customers everyday, it supports a variety of other use cases. As shown in the two patterns above, if you did need to troubleshoot with a subset of the logs and need analytical capabilities with another subset of your logs, you can use the Cloud Logging API to send the first part of your logs to Cloud Logging and the other part of your logs directly to BigQuery!

Furthermore, there is work ongoing to simplify many of these tasks. For example, we recently released the preview of Log Analytics, which automatically imports logs from Google Cloud services into BigQuery, and gives you the ability to analyze that data directly from the Cloud Logging interface.

Get started today

If you want to achieve the maximum benefit from your logs without the cost and operational overhead, set up time with your account team today or click here to contact our sales team.

If you have any questions or topics that you want to discuss with the operations community at Google Cloud, please visit our Google Cloud Community site.

 

 

By: Meenaxi Gunjati (Customer Engineer) and Tinu Anand Chandrasekar (Customer Engineer, Google Cloud)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Cloud Logging
  • Cloud Operations
  • Google Cloud
You May Also Like
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Hybrid Cloud
  • Technology

IBM Completes Acquisition of HashiCorp, Creates Comprehensive, End-to-End Hybrid Cloud Platform

  • February 27, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
Vehicle Manufacturing
View Post
  • Hybrid Cloud
  • Public Cloud

Toyota shifts into overdrive: Developing an AI platform for enhanced manufacturing efficiency

  • December 10, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.