RocksDB is an embeddable persistent key-value store for fast storage. So fast, in fact, its logo includes a cheetah, the world’s fastest land animal. Cheetahs can hit speeds of up to 29 meters per second (or about 64 miles per hour), especially when they’re on the hunt for a snack. But with too many prey options to chase all day long, even cheetahs will reach their limit and tire — just as databases do when they’re overwhelmed by ever-increasing volumes of metadata.
As the volume of objects arriving from a variety of sources continues to challenge storage engines, metadata sprawl is expected to continue accelerating. Those who use the cheetah-fast RocksDB often resort to sharding to handle data-intensive workloads. Sharding distributes a single dataset across multiple databases that are stored on multiple machines, which are then split into smaller chunks and stored in multiple data nodes. This increases the total storage capacity of the system, but also introduces complexity and overhead, so it’s less than ideal.
Modern applications that deal with huge volumes of data and metadata demand more capacity and performance, but current storage engines face limitations when scaling large datasets and can’t keep up with the demand. Developers have no choice but to compromise on essential elements which are critical for high-performing production workloads, whether it’s capacity, scalability, cost, or performance, and are forced to take on tedious tasks of data management such as sharding.
Enter Speedb.
In early 2020, Speedb’s founders realized that they could re-implement log-structured merge (LSM) trees — the data structure at the core of RocksDB — more efficiently than RocksDB could. And by providing extra performance and scale using less resources, they could allow companies to run much more data at a much higher speed, without making any changes to applications and their underlying data infrastructure.
And so, Speedb was conceived as a “next-generation data engine”, offering a drop-in replacement embedded solution for RocksDB. It’s a revolutionary enterprise-grade data engine built from the ground up for write-intensive workloads, and tailored for organizations to hyperscale their data processing needs without compromising storage capacity and agility.
Simply put — it makes a huge difference to performance and scalability, and eliminates stalls, latency and sharding.
How much of a difference?
Think of it this way: Research shows that 99.9% of the genetic information in DNA is common to all human beings. The remaining 0.01% is responsible for differences in hair, eye and skin color, height and propensity to certain diseases — which is a huge amount of incredibly important data, when you consider it’s all contained in just 0.01% of the human genome.
Scientists believe that all life evolved from a common ancestor, which means that humans also share DNA sequencing with all other living organisms, including animals, like chimpanzees and bonobos (our “closest” relatives) with whom we share 98.7% of genetic sequencing, the Abyssinian house cat, with whom we share about 90% of DNA, and mice, with whom we share about 85% of DNA. And even though much of the shared DNA with animals (and even plants!) is “silent” and isn’t involved in the coding sequence, the point is that even the smallest differences in our genetic information can be the difference between being a human, or something else entirely.
And that’s the level of revolution that Speedb brings to the Key Value Store (KVS) market.
Although both RocksDB and Speedb are data engines that share many similar traits, there are some features in Speedb that render their capabilities worlds apart. For example, re-writing RocksDB features such as clean and dirty memory management, quota management, and column family monitoring is a relatively small change in terms of lines of code. However, it enables Speedb to deliver big benefits in performance, resource utilization and avoiding out-of-memory crashes to improve stability.
For those currently working in the field of DBMS, using Speedb to boost performance and scale is a bit like getting a front-row seat to the evolution of data engines (minus the popcorn).