How VictoriaMetrics makes instant snapshots for multi-terabyte time series data

  • High insert performance on high-cardinality data. See this article for details.
  • High select performance when big amounts of data is analyzed. See this article and this spreadsheet for details.
  • High compression rate for typical time series data.
  • Online instant snapshots without degrading database operations.

A few words about ClickHouse

What is MergeTree?

  • Data for each column is stored separately. This reduces overhead during column scans, since there is no need in spending resources on reading and skipping data for other columns. This also improves per-column compression ratio, since individual columns usually contain similar data.
  • Rows are sorted by a “primary key”, which may span multiple columns. There is no unique constraint on a primary key — multiple rows may have identical primary key. This allows for quick row lookups and range scans by a primary key or by its’ prefix. Additionally this improves compression ratio, since consecutive sorted rows usually contain similar data.
  • Rows are split into moderately sized blocks. Each block consists of per-column sub-blocks. Each block is processed independently. This means close-to-perfect scalability on multi-CPU systems— just feed all the available CPU cores with independent blocks. Block size may be configured, but it is advisable to use sub-blocks with sizes in the range of 64KB-2MB, so they fit CPU caches. This improves performance, since CPU cache access is much faster than RAM access. Additionally this reduces overhead when only a few rows must be accessed out of a block with many rows.
  • Blocks are merged into “parts”. These parts are similar to SSTables from Log Structured Merge (LSM) tree. ClickHouse merges smaller parts into bigger parts in the background. Unlike canonical LSM, MergeTree doesn’t have strict levels with similarly-sized parts. The merge process improves query performance, since lower number of parts are inspected with each query. Additionally the merge process reduces the number of data files, since each part contains fixed number of files proportional to the number of columns. Parts’ merging has yet another benefit — better compression rate, since it moves closer column data for sorted rows.
  • Parts are grouped into partitions by “partitioning key”. Initially ClickHouse allowed creating per-month partitions on a Date column. Now arbitrary expressions may be used for building partitioning key. Distinct values for partitioning key result in separate partitions. This allows fast and easy per-partition data archiving / removal.

Instant snapshots in VictoriaMetrics

  • Newly added parts either appear in the MergeTree or fail to appear. MergeTree never contains partially created parts. The same applies to merge process — parts are either fully merged into a new part or fail to merge. There are no partially merged parts in MergeTree. Does this means that parts appear out of blue? No. Parts are assembled in temporary directories and then atomically moved to MergeTree. The same applies to the merge — old parts are atomically swapped with the new part when it is ready.
  • Part contents in MergeTree never change. Never ever. Parts are immutable. They may be only deleted after the merge to a bigger part.

Conclusion

  • /snapshot/list — lists available snapshots
  • /snapshot/create — creates new snapshot
  • /snapshot/delete?snapshot=… — deletes the given snapshot

--

--

--

Founder and core developer at VictoriaMetrics

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Kickstarting Careers during Summer Holidays with a 4-week Programming Bootcamp!

The Future of Ops

Professional Layouts In A Few Steps With WebSharper

Image Staganography

Introduction to Python Programming: Variables and Data Types

Managing the Madness of Product Management

Stream avro data from kafka over ssl to Apache pinot

Continuous Deployment for AWS Glue

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aliaksandr Valialkin

Aliaksandr Valialkin

Founder and core developer at VictoriaMetrics

More from Medium

Understand Kafka-Python.

Kafka Architecture

Running Apache NiFi on Windows 11

Kubernetes (k8s) — simply explained!

Building an MLOps infrastructure on OpenShift