aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Tech

TiKV + SPDK: Pushing The Limits Of Storage Performance

  • aster.cloud
  • June 11, 2021
  • 5 minute read

Guest post originally published on PingCAP’s blog by Ke’ao Yang, Software Engineer at PingCAP

TiKV + SPDK: Pushing the Limits of Storage Performance

At the heart of modern software is layered abstraction. In abstraction, each layer hides details that are not relevant to other layers and provides a simple, functioning interface.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

However, because abstraction brings the extra overhead of system calls, its simplicity comes at the cost of performance. Thus, for performance-sensitive software such as databases, abstraction might bring unwanted consequences. How can we boost performance for these applications?

In this article, I’ll talk about the pitfalls of abstraction in the modern storage structure and how we tackled the problem at TiDB Hackathon 2020 by introducing the Storage Performance Development Kit (SPDK) in TiKV.

 

The pitfalls of abstraction

In the magic world of abstraction, everything is neatly organized. The application doesn’t make its execution plan; it lets the database decide. The database doesn’t write data to disks; it leaves that to the operating system’s discretion. The OS, of course, never operates directly on memory cells; instead, it asks the disk’s control chip to do it. More often than not, the concrete tasks are handed over to the party with more knowledge so that everyone is happy and content.

When a database reads and writes data, it takes advantage of abstraction in the file system. Via the file system, the database creates, writes, and reads files without directly operating on the hard disks. When the database starts a system call, it takes several steps before the data is finally persisted to the disks:

  1. The OS receives the system call.
  2. The OS operates on the virtual file system (VFS) and the page caching mechanism.
  3. The file system implements specific behaviors.
  4. Data is read from and written into the block device, where the I/O schedulers and device mappers take effect.
  5. The system sends instructions to storage devices via hardware drivers.
  6. The disk controller processes the instructions and operates on the storage medium.
Read More  Join Us For Google Cloud Security Talks: Zero Trust Edition

This process may have four underlying problems:

  • The system calls have high overhead. For complicated system calls like I/O operations, the system switches from user mode to kernel mode, saves the process state, and then restores the process state. Nowadays, as storage devices’ performance greatly improves, that overhead is taking up a larger proportion of resource consumption.
  • The file system implementation has an out-of-control impact on high-level applications. Different file systems use different data structures to manage their files and thus are suitable for different scenarios. XFS, for example, doesn’t do well with lots of small files. The high-level applications might not respond well to the variation of file systems.
  • As a generic OS, Linux prioritizes its universality, which means its page caching algorithm and hardware support are designed to work for all applications and all hardware. That means it doesn’t fully optimize the mechanism for specific software applications (such as databases) and hardware (such as NVMe drives).
  • Repetitive logging is also a concern. Every level of abstraction tries to maintain its stability by logging. NVMe controllers, the file system, and RocksDB all have logs. In the current mechanism, these repetitive logs are essential for the robustness of the system. But if we could redesign the system from a global perspective, we might be able to remove some logging.

The solution to all these problems is to push the application towards the lower level to reduce abstraction. The more chores the application does by itself, the greater performance boost it gets.

 

How we reduce abstraction in TiKV

TiKV is a distributed transactional key-value database. At TiDB Hackathon 2020, we tried to reduce abstraction in TiKV by pushing TiKV to step 5 mentioned above, that is, sending instructions to NVMe disks directly. This way, we eliminated the overhead of several abstractions.

Read More  Expanding Google Cloud’s Confidential Computing Portfolio

 

SPDK and BlobFS

How many levels of abstraction should be reduced is a matter of trade-off. Because of NVMe’s popularity and its new features (4 KB atomic writes, low latency, and high concurrency queues), we believe that redesigning TiKV for NVMe disks is worth the effort.

Luckily, the open-source community thinks so, too. Intel provides SPDK, a set of open-source toolkits and libraries for writing high-performance storage applications, which includes user-mode drivers and packages for NVMe. Via methods like Virtual Function I/O (VFIO), user-mode drivers map the hardware I/O to memory accessible by the user mode so that the application can access the hardware without taking a detour to the OS. VFIO is widely used in virtual machines to allow them to directly access the graphics card or network card.

Beyond that, SPDK implements a file system called BlobFS, which provides function interfaces similar to the POSIX file system. With BlobFS, the application performs I/O operations in three steps:

  1. The application calls functions such as blobfs_create and blobfs_read.
  2. The filesystem operations are mapped to storage device operations.
  3. BlobFS sends the instructions to NVMe disks. (Strictly speaking, the instructions are written into the corresponding memory space in NVMe devices.)

Compared with the Linux I/O process, the SPDK I/O processes are much simpler and more efficient:

  • The function calls have a lower overhead than syscalls.
  • BlobFS uses NVMe’s atomic write feature and thus simplifies the management of file system metadata.
  • Its optimized page caching strategy is suitable for contiguous reads and writes. For RocksDB, the optimization is significant for LSM compaction, but unfortunately it also removes extra caching for point queries.

This solution solves the four problems we mentioned earlier: it removes the syscall overhead, uses data structures and caching algorithms more suitable for databases and NVMe disks, and simplifies file system logging. If we integrate SPDK BlobFS into TiKV, we can expect to see a huge performance increase.

Read More  Support For 100 Large-Scale Clusters

 

Measuring TiKV Improvements

To benchmark our experiment, we use YCSB Workload A (update heavy workload) to test the final I/O. As shown in the figures below, the results are beyond our expectations. The labels on data points refer to the number of client threads. Clearly, under the same latency, SPDK-based TiKV processes higher operations per second (OPS):

YCSB Workload A read performance

YCSB Workload A read performance

YCSB Workload A write performance

YCSB Workload A write performance

Going forward

How far could this Hackathon project go? The answer depends. As NVMe disks become more popular and TiKV keeps striving for higher performance, our project is sure to deliver concrete results.

Moreover, as TiDB Cloud, the fully managed TiDB service, becomes publicly available, TiDB users can start a cluster in a few clicks. They can enjoy the benefits of SPDK without enduring the pain of configuring it by themselves.

In the industry, people have been exploring related topics for years:

  • Some researchers push the software down further to step 6—letting the software complete the tasks of NVMe controllers, such as the Open-Channel SSD technology. There is a paper on the commonality between SST file read-only mode in LSM tree and block read-only mode in SSD (erase before rewrite). This can reduce the performance downgrade caused by unexpected erasing.
  • There are also projects working on how to read and write on Linux block devices. For example, Ceph’s BlueFS has higher performance than direct file system read and write, and shows better hardware compatibility and lower risks than SPDK.

These approaches are promising in their own ways. One day, we may see one or more of them integrated into TiKV, creating a database with ever higher storage performance.

By Ke’ao Yang
Source CNCF


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • CNCF
  • POSIX
  • TiDB Cloud
  • TiKV
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Tech

Deep dive into AI with Google Cloud’s global generative AI roadshow

  • February 18, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
Volvo Group: Confidently ahead at CES
View Post
  • Tech

Volvo Group: Confidently ahead at CES

  • January 8, 2025
zedreviews-ces-2025-social-meta
View Post
  • Featured
  • Gears
  • Tech
  • Technology

What Not to Miss at CES 2025

  • January 6, 2025
View Post
  • Tech

IBM and Pasqal Plan to Expand Quantum-Centric Supercomputing Initiative

  • November 21, 2024
Black Friday Gifts
View Post
  • Tech

Black Friday. How to Choose the Best Gifts for Yourself and Others, Plus Our Top Recommendations.

  • November 16, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024

Stay Connected!
LATEST
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 7, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 6, 2025
  • 3
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • Getting things done makes her feel amazing 4
    Nurturing Minds in the Digital Revolution
    • April 25, 2025
  • 5
    AI is automating our jobs – but values need to change if we are to be liberated by it
    • April 17, 2025
  • 6
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 7
    United States Army Enterprise Cloud Management Agency Expands its Oracle Defense Cloud Services
    • April 15, 2025
  • 8
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 9
    IBM Accelerates Momentum in the as a Service Space with Growing Portfolio of Tools Simplifying Infrastructure Management
    • March 27, 2025
  • 10
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • 2
    Co-op mode: New partners driving the future of gaming with AI
    • March 22, 2025
  • 3
    Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience
    • March 10, 2025
  • PiPiPi 4
    The Unexpected Pi-Fect Deals This March 14
    • March 13, 2025
  • Nintendo Switch Deals on Amazon 5
    10 Physical Nintendo Switch Game Deals on MAR10 Day!
    • March 9, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.