aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Programming

Compute And Storage Should Be Decoupled For Log Management At Scale

  • aster.cloud
  • June 7, 2021
  • 5 minute read

Guest post originally published on The New Stack by Tito George, co-founder, logic.ai

Most log management solutions store log data in a database and enable search by storing an index of the data. As the database grows in size, so does the index management cost. On a small scale, this isn’t problematic. But when dealing with large-scale deployments, organizations end up using lots of compute, storage and human resources just to manage their indexes, in addition to data itself. When companies are handling terabytes of data every day, the database-backed log management system becomes untenable.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Another common issue is that most log solutions don’t store just one set of data. Many DIY log management implementations use popular databases such as MongoDB, ElasticSearch and Cassandra. Let’s take ElasticSearch as an example. An ElasticSearch cluster runs several replicas of data in the hot store tier to ensure high availability. Even with data compression, the replication required to keep the data available still dramatically increases the total amount of storage necessary. The problem is magnified when you account for storage needed for indexes.

Clustering also increases the management complexity and requires users to understand how to manage node failures and data recovery. Even with replication, it is impossible to immediately spin up a new instance when an instance goes down. In most cases, there is some downtime when the log analytics system becomes unavailable. While this happens, data continues to come in because logs are generated in real-time. Catching up requires additional provisioning of resources. Because the real-time data never stops, it can be hard to get the log analytics system to catch up. One-click elasticity is critical to managing this at scale.

The challenges outlined above are classic examples of hidden “storage operations tax” that any DIY solution has to pay. The larger the scale, the higher the tax!  A company ingesting around one terabyte of data per day would need multiple terabytes of storage and a proportional amount of RAM if they wanted to keep 30 days worth of log data searchable.

Read More  Building A SaaS Architecture With A Single Tenant Application

The way to solve this problem is by moving away from databases and using a scalable API storage layer. An API storage layer like Amazon Web Services‘ S3, which has traditionally been used for cold storage, fits this requirement quite well. It provides high availability and durability, infinite scale, the lowest price per GB and effectively takes your storage operations tax to zero. However, to make this work, one has to ensure that applications do not have the higher latency that is typical with cold storage.

 

Are You Keeping 30 Days’ Worth of Data?

Enterprises think they are keeping 30 days’ worth of log data in their hot storage, but they aren’t actually doing so. Most queries are in the form of periodically run reports that are not interactive with a user sitting at the console. This is especially true at scale when it is not uncommon to ingest hundreds of megabytes or gigabytes of log data in a minute. Interactive workflows in such environments focus on identifying relevant events and data patterns that are then programmed into a machine and converted to timely real-time notifications to the administrator. This means that most data does not need to be in hot storage at all but rather can be processed in-line during ingest or asynchronously at a later point in time.

There’s another good reason that companies move data into S3-compatible or other cold storage quickly. Reducing data duration in a database separates the data storage from compute and makes it easier for organizations to scale their storage and recover from crashed clusters. It’s dramatically cheaper to store data in cold storage than in a database, and scaling cold storage is easier than scaling a database.

Read More  Top 12 Lesser Known Tips for JavaScript Best Practices

This approach, however, creates a new problem where we need to separate data into multiple tiers; hot and cold. Moving and managing data between the two tiers requires expertise. Considerations around what to tier, how often to move data and when to hydrate the hot tier with data from the cold tier now become business as usual. The “storage operations tax” just went up.

 

What if I Need Long-Term Data Retention?

In highly regulated environments, short-term retention is usually not an option as businesses must store data, index it and make data searchable for several years. The same problems exist, albeit at an even larger scale. The choice is between vast amounts of expensive primary storage or tiered storage architecture. With such requirements, it is not uncommon to have the tiered implementation with most of the data sitting in the cold tier, yet with significant data still in the hot tier (e.g., 30-day retention). The “storage operations tax” isn’t going anywhere, just increasing.

 

Eliminating Legacy Storage Architecture and Data Tiering

Companies use a tiered approach to storage because they fear losing the ability to search data in cold storage. If searching is necessary, an arduous request process makes accessing the logs slow and challenging. Running real-time searches on older data is impossible. For some application types, this isn’t a big deal. Still, for revenue-producing, critical path applications, it’s crucial to have quick, real-time access to logs and the ability to get the information out of them at a moment’s notice. Having multiple data tiers, where there is a “hot” store and a “cold” store, creates cost and management overhead, particularly for Day 2 operations. Moving everything to a hot store would be extremely expensive — so what if you could make cold storage your primary store?

Read More  IBM Signs Strategic Collaboration Agreement With Amazon Web Services To Deliver IBM Software As-a-Service On AWS

 

Making S3 Searchable or ‘Zero Storage Operations Tax’

What if we could make S3-compatible storage just as searchable as a database? The reason companies keep their log data in a database is to enable real-time searches. Still, in practice, most organizations are not keeping nearly as much historical data in databases as their official data retention policies dictate. Suppose any S3-compatible store can be just as searchable as a database. In that case, organizations can dramatically cut down the amount of data stored in databases and the accompanying computing resources needed to manage that data. The most recent data — say, one minute of data — can be stored on the disk, but after a minute, everything moves to S3. There’s no longer the need to run multiple instances of a database for high availability because if the cluster goes down, a new one can be spun up and pointed to the same S3-compatible bucket.

Moving log data directly to cold storage while ensuring real-time searchability makes it easier to scale, increases the log data’s availability and dramatically decreases costs, both on storage and computing resources. When log data is accessed directly in the cold storage, users don’t have to worry about managing indexes between hot and cold store tiers, rehydrating data, or building complex policies. It also means that companies can follow the data retention plans they have to ensure developers can access logs and use them to debug critical applications.


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • AWS
  • Cassandra
  • CNCF
  • Elasticsearch
  • logic.ai
  • MongoDB
  • The New Stack
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
dotlah-smartnation-singapore-lawrence-wong
View Post
  • Data
  • Enterprise
  • Technology

Growth, community and trust the ‘building blocks’ as Singapore refreshes Smart Nation strategies: PM Wong

  • October 8, 2024
nobel-prize-popular-physics-prize-2024-figure1
View Post
  • Data
  • Featured
  • Technology

They Used Physics To Find Patterns In Information

  • October 8, 2024
goswifties_number-crunching_202405_wm
View Post
  • Data
  • Featured

Of Nuggets And Tenders. To Know Or Not To Know, Is Not The Question. How To Become, Is.

  • May 25, 2024
View Post
  • Data

Generative AI Could Offer A Faster Way To Test Theories Of How The Universe Works

  • March 17, 2024
Chess
View Post
  • Computing
  • Data
  • Platforms

Chess.com Boosts Performance, Cuts Response Times By 71% With Cloud SQL Enterprise Plus

  • March 12, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.