aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering

How To Investigate High Tail Latency When Using Cloud Spanner

  • aster.cloud
  • December 29, 2021
  • 4 minute read

When you use Cloud Spanner, you may encounter some high tail latency cases. Some of the causes may be on the Cloud Spanner side, but there could be some other reasons as well. In this blog post, we will  talk about how to distinguish the high latency causes and also talk about some tips to improve Cloud Spanner latency.

Check the relationship between the high latency and Cloud Spanner usage

If you can find the high latency in Cloud Spanner metrics which are available in Cloud Console or Cloud Monitoring, the latency cause is either at [3. Cloud Spanner API Front End] or [4. Cloud Spanner Database] in the diagram from the Cloud Spanner end-to-end latency guide. Further investigation at Cloud Spanner level is needed.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

On the other hand, if you can’t confirm the high latency in Cloud Spanner metrics, the high latency likely happened before reaching Cloud Spanner from the client.

If high latency was observed in your client metrics, I recommend you check if

  • accessing other services had high latency
  • the client machine had any resource shortage issue
  • the high latency happened to a specific client machine

Some example causes are:

  • sudden CPU utilization spike (this itself is not the cause, but indicates other processes in the machine may have caused the latency)
  • hitting Disk I/O performance limit
  • ephemeral port exhaustion and not being able to establish a TCP connection.
  • high latency due coming from DNS queries

You can also measure the latency at [2. Google Front End] in Cloud Spanner end-to-end latency guide — time from when GFE sends a request to when GFE gets a first response from [4. Cloud Spanner database] via [3. Cloud Spanner API Front End]. If you observe high latency in this metric, a further investigation needs to be performed on the GCP side. This can be achieved by opening a support ticket if you have a support package. (However, this should be quite rare.)

Read More  Now Generally Available: BigQuery BI Engine Supports Any BI Tool Or Custom Application

Note that this GFE metric doesn’t include latency for TCP/SSL handshake. If you have no idea about the latency cause based on client, GFE, and Cloud Spanner metrics, you may need to get a packet capture and check if there is high latency in TCP/SSL handshake. (However, this should also be quite rare.)

Investigate high latency in Cloud Spanner usage

If you observe high latency in Cloud Spanner metrics, the most typical cause is the lack of Spanner nodes. Make sure that your CPU utilization is within the recommended value in Alerts for high CPU utilization. Note that low/middle priority tasks (such as generating statistics packages, compaction, schema changes) don’t affect higher priority tasks when the CPU utilization is low, but low priority tasks can affect higher ones when the utilization gets close to 100%.

If your CPU utilization is high, you can narrow down affecting queries based on Investigating high CPU utilization.

If you observe high latency even though the overall CPU utilization is not high, the cause may be due to hot spots or lock wait.

For hot spots, you can check the frequently accessed keys by Key Visualizer. In some cases, hot spots may subside due to optimizations in Cloud Spanner. However, optimizations cannot address all the cases depending on the key design or traffic pattern. Schema design best practices will be useful in such cases.

To investigate lock wait times, you can refer to Lock statistics. Note that because detailed information will become unavailable as time passes (see Data retention), it’s more effective to check SPANNER_SYS.LOCK_STATS_TOP_MINUTE or SPANNER_SYS.LOCK_STATS_TOP_10MINUTE as soon as the high latency issue happens.

Read More  Automating Income Taxes With Document AI

Also you can associate tags with your queries, read requests, and transactions. You’d be able to identify the cause of high latency more effectively by using the tagging feature and statistics tables.

Tips to avoid high latency

In most cases, you’ll find the cause and measures based on the aforementioned approaches. Let me introduce some tips to avoid high latency for the use cases where you have difficulty in finding the cause based on statistics tables and Key Visualizer.

Use stale reads

Cloud Spanner guarantees strong consistency against read operations by default. However, using stale read even with short staleness (e.g. 1 sec) may improve performance dramatically. This can be effective especially when you need to read rows which are also updated frequently and don’t require strong consistency with the updates.

Incorporate column data into indexes by using STORING clause

When you use FORCE_INDEX in a SELECT query, you’ll get results without scanning a base table from the index if the data in SELECT columns are stored in the index itself. You can achieve this by using the STORING clause.

If you see a large time gap between latency in Scan Index and latency in its upper Distributed Union/Distributed Cross Apply, using STORING clause would provide large performance gains.

 

Click to enlarge

 

 

 

Use Partitioned DML in deleting rows

In some use cases, you may want to delete some rows periodically. Creating a row deletion policy with TTL is the convenient approach, but if you want to do it on your own, you can minimize the scope of lock ranges by using Partitioned DML because it’ll be executed in parallel, hence minimizing the effect to other requests. One caveat is that the operation must be idempotent. In other words, you can’t use Partitioned DML if a difference between the result of performing the operation once and the result of performing it multiple times is not acceptable.

Read More  What You Need To Know About Compiling Code

A few second latency at p99 can happen

There are some situations where you can’t suppress such latency increases. The Spanner Frontend servers ([3. Cloud Spanner API Front End] in the latency guide) are occasionally restarted due to maintenance. If your request (session) happens to be on the server which is about to restart, it takes a few seconds in session takeover to another server. The maintenance is essential to ensure the service level and the tail latency due to this event is inevitable.

That’s it. I hope this article will help you find the high latency cause and measure you haven’t come up with.

 

By: Tomoaki Fujii (Technical Solutions Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Cloud Spanner
  • Data Manipulation Language
  • Google Cloud
You May Also Like
View Post
  • Engineering
  • Technology

Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design

  • June 9, 2025
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025

Stay Connected!
LATEST
  • Camping 1
    The Summer Adventures : Camping Essentials
    • June 27, 2025
  • Host a static website on AWS with Amazon S3 and Route 53
    • June 27, 2025
  • Prioritize security from the edge to the cloud
    • June 25, 2025
  • 6 edge monitoring best practices in the cloud
    • June 25, 2025
  • Genome 5
    AlphaGenome: AI for better understanding the genome
    • June 25, 2025
  • 6
    Pure Accelerate 2025: All the news and updates live from Las Vegas
    • June 18, 2025
  • 7
    ‘This was a very purposeful strategy’: Pure Storage unveils Enterprise Data Cloud in bid to unify data storage, management
    • June 18, 2025
  • What is cloud bursting?
    • June 18, 2025
  • 9
    There’s a ‘cloud reset’ underway, and VMware Cloud Foundation 9.0 is a chance for Broadcom to pounce on it
    • June 17, 2025
  • What is confidential computing?
    • June 17, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Oracle adds xAI Grok models to OCI
    • June 17, 2025
  • Fine-tune your storage-as-a-service approach
    • June 16, 2025
  • 3
    Advanced audio dialog and generation with Gemini 2.5
    • June 15, 2025
  • Google Cloud, Cloudflare struck by widespread outages
    • June 12, 2025
  • 5
    Global cloud spending might be booming, but AWS is trailing Microsoft and Google
    • June 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.