aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering
  • Practices

Cloud Bigtable Schema Tips: Key Salting

  • aster.cloud
  • November 10, 2022
  • 5 minute read
Cloud Bigtable is a low-latency, high-throughput NoSQL database. With NoSQL, you want to design a schema that can scale and adapt to your business growth. When working with large sets of data in the real world, it’s possible there will be access pattern outliers with significantly more activity that requires a bit more planning. In this article, we are going to learn how to optimize a Bigtable schema to increase performance on highly active rows on an otherwise well-balanced schema.

Row key design refresher

Bigtable performs best when the throughput is evenly distributed across the entire row key space and can spread across all the nodes. Bigtable rows are physically stored in tablets containing groups of contiguous row keys, and each tablet is distributed to the available nodes. If rows on the same tablet are receiving a disproportionately large percentage of requests compared to other tablets in that node, that can impact performance.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Typically, row keys are designed to be optimized for particular queries. For example, to have queries centered around individual users you may put a user id at the beginning, like so: user_id-xxx-yyy. When some users are significantly more active than others, such as the case for celebrity accounts, writes and reads from their rows could cause hotspotting by putting too much pressure on specific nodes.

If we can distribute the logical row by physically dividing it amongst multiple tablets, then the rows can get balanced across the available nodes and reduce the hotspot.

Key prefix salting

A well-distributed user id would typically work as a row key prefix, so we can use this as the starting point for our row key design:
user_id-xxx-yyy

One strategy to distribute this unbalanced throughput across all Bigtable nodes is to prepend an additional value to the row key design:
01-user_id-xxx-yyy
02-user_id-xxx-yyy

This example has two physical rows corresponding to one logical row which divides the throughput in half. This will distribute all the rows for a particular user id across the rest of the keyspace. Since their prefix is different, they should be able to live on different tablets and are more likely to be hosted on multiple nodes. Note that it is possible for both prefixes to be in the same node or for one prefix to be split into multiple nodes since this setup’s goal is to provide more options to the load balancing mechanism.

Read More  Announcing Quick Builder, A New Low-Code Tool For You To Build Location-Based Experiences

Choosing a prefix

Choosing a prefix that doesn’t add much complexity for requests is important to consider. If we used random prefixes, each get request would turn into multiple get requests to ensure the correct row was located. If the prefix is deterministic from the row key, then it allows for minimal changes to single-row read and write requests.

If we would like N divisions, we can take modulo N of the hash of the entire existing row key. We will also refer to N as our salt range.

int prefix = rowKey.hashCode() % saltRange;
String saltedRowKey = prefix + "-" + rowKey;

A point lookup and write will still work as the physical key can be computed from the logical key. Salting won’t eliminate the hotspots, but it spreads them into N hotspots of strength 1/N. These less severe hotspots can be more easily processed by individual nodes.

Prefix options

If you have common scans over prefixes that you would like to stay intact, you can also hash just part of the row key rather than the entire row key.

For a row key of the format user_id-site-timestamp, you might want efficient scans over user_id and site combinations. Here, we can leave off the timestamp when creating the hash, so the time-series data for those combinations will always be grouped together.

String rowKeyBase = rowKey.substring(0, rowKey.lastIndexOf("-"));
int prefix = rowKeyBase.hashCode() % saltRange;
String saltedRowKey = prefix + "-" + rowKey;

Keys with the same logical prefix that is often scanned can still be efficiently scanned.This strategy is less resistant to hotspots—the same problem that the salting strategy is supposed to mitigate can come up again if individual user_id, site combinations get significant access.

Read More  Build And Run A Discord Bot On Top Of Google Cloud

Implementation

To implement this in your code, you’ll need to change the areas where you are making requests to Bigtable data. You can view the full source code example on Github.

Writing

Using this new technique, if you want to write data, follow these steps:

  1. Take the row key you intend to write to
  2. Compute the prefix using your hash function
  3. Construct the salted row key by concatenating the prefix and row key
  4. Then use the salted row key for writing your data

You will need to ensure that you integrate this flow to anywhere you are writing data.

In Java, it would look something like this:

String saltedRowKey = getSaltedRowKey(rowKey, SALT_RANGE);
RowMutation rowMutation = RowMutation.create(tableId, saltedRowKey)
                              .setCell(....

Reading

Gets

To read individual rows in a table with salted keys, you would follow the same initial steps in writing the data like:

  1. Take the row key you intend to read
  2. Compute the prefix using your hash function
  3. Construct the salted row key by concatenating the prefix and row key
  4. Then use the salted row key for reading your data

Since the physical row key is computed deterministically from the logical row key, only one read needs to be issued for each logical key.

In Java, it would look something like this:

Row row = dataClient.readRow(tableId, getSaltedRowKey(rowKey, SALT_RANGE));

ScansYou can follow these steps for each scan:

  1. Take the row key prefix you intend to scan
  2. For 0 to N (each potential salt option)
    1. Construct the salted row key by concatenating the prefix and row key
    2. Then use the salted row key for your prefix scan
    3. Issue this scan in parallel
  3. Combine the results of all the scans

Let’s look at an example. Say you wanted to get all the data for a user and one subcategory; you would do a prefix scan on “user_id-xxx-“. If you’re working with salted rows, you would need to prefix scans based on how large your hash size is. If our hash size is 4, then we would do 4 prefix scans:

  • 01-user_id-xxx-
  • 02-user_id-xxx-
  • 03-user_id-xxx-
  • 04-user_id-xxx-
Read More  BigQuery Geospatial Functions - ST_IsClosed And ST_IsRing

For the best performance you would want to issue each scan in parallel rather than sending all the prefixes into one request. Since the requests are done in parallel, the rows may not be returned in sorted order. If row order is important you will have to do some additional sorting once the results are received.

Because the physical row keys are no longer a contiguous range, these scans may consume more Bigtable CPU which is an important consideration for choosing a salting factor with a scan-heavy workload. Large scans, however, may be more performant as more resources can be used in parallel to serve the request.

In Java, it would look something like this:

List<Query> queries = new ArrayList<>();
for (int i = 0; i < SALT_RANGE; i++) {
  queries.add(Query.create(tableId).prefix(i + "-" + prefix));
}

List<ApiFuture<List<Row>>> futures = new ArrayList<>();
for (Query q : queries) {
  futures.add(dataClient.readRowsCallable().all().futureCall(q));
}

List<Row> rows = new ArrayList<>();
for (ApiFuture<List<Row>> future : futures) {
  rows.addAll(future.get());
}

for (Row row : rows) {
  // Access your row data here.
}

Forward looking migrations

It can be difficult to make a large change to existing datasets, so one way to migrate is only applying the salt moving forward. If you have timestamps at the end of the key, change the code to salt row keys past a certain fixed point in time, and just use an unsalted key for old/existing keys.

Next steps

  • Read more about reading and writing data to Bigtable
  • Learn more about Bigtable performance
  • See if alternative solutions like adding a cache layer to Bigtable could help

 

 

By: Greg Colella (Software Engineer, Cloud Bigtable) and Billy Jacobson (Developer Advocate, Cloud Bigtable)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Bigtable
  • Databases
  • Google Cloud
  • NoSQL
  • Tutorials
You May Also Like
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
View Post
  • Engineering

Transforming the Developer Experience for Every Engineering Role

  • July 14, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • oracle-ibm 3
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • 4
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 5
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 6
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 7
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 8
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 9
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 10
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
  • 2
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 3
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 4
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 5
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.