Timescale recently published Promscale — an open source long-term remote storage for Prometheus built on top of TimescaleDB. According to the announcement, it should be fast and resource efficient. Let’s compare performance and resource usage on production workload for Promscale and VictoriaMetrics.

Benchmark setup

The following scheme has been constructed for the benchmark:

                                    /--> VictoriaMetrics
2000 x node_exporter <-- vmagent --|
\--> Promscale

Node_exporter v1.0.1 has been installed on a single e2-standard-4 instance in GCP. It exports real-world resource usage metrics such as CPU usage, memory usage, disk IO usage, network usage, etc. These metrics are usually collected in typical production workloads.

Recently single-node VictoriaMetrics gained support for scraping Prometheus targets. This made possible to run apples-to-apples benchmark, which compares resource usage for Prometheus and VictoriaMetrics during scraping big number of real node_exporter targets.

Benchmark setup

The benchmark was run in Google Compute Engine on four machines (instances):

  • An instance with node_exporter v1.0.1 for scraping. It was run on e2-standard-4 machine with the following config: 4vCPU, 16GB RAM, 1TB HDD persistent disk. Initial tests revealed that the node_exporter cannot process more than a few hundred requests per second. Prometheus and VictoriaMetrics were generating much higher load on the node_exporter during tests. So it has…

Photo by Luke Chesser on Unsplash

Prometheus supports relabeling, which allows performing the following tasks:

  • Adding new label
  • Updating existing label
  • Rewriting existing label
  • Updating metric name
  • Removing unneeded labels
  • Removing unneeded metrics
  • Dropping metrics on certain condition
  • Modifying label names
  • Constructing a label from multiple existing labels
  • Chaining relabeling rules

Lets’ looks at how to perform each of these tasks.

Adding new label

New label can be added with the following relabeling rule:

- target_label: "foo"
replacement: "bar"

This relabeling rule adds {foo="bar"} label to all the incoming metrics. For example, metric{job="aa"} will be converted to metric{job="aa",foo="bar"}.

Updating existing label

Existing label can be updated with the relabeling rule mentioned above:

Recently ScyllaDB published an interesting article How Scylla scaled to one billion rows per second. They conducted a benchmark (named Billy) for a typical time series workload, which simulates a million temperature sensors reporting every minute for a year’s worth of data. This translates to 1M*60*24*365=525.6 billion data points. The benchmark was run on beefy servers from Packet:

ScyllaDB cluster achieved scan speed of more than 1 billion of data points per second on this setup. Later ClickHouse provided good results for slightly modified Billy benchmark

Photo by Iñaki del Olmo on Unsplash

Many technical terms could be used when referring to Prometheus storage — either local storage or remote storage. New users could be unfamiliar with these terms, which could result in misunderstandings. Let’s explain the most commonly used technical terms in this article.

Time Series

A time series is a series of (timestamp, value) pairs sorted by timestamp. The number of pairs per each time series can be arbitrary — from one to hundreds of billions. Timestamps have millisecond precision, while values are 64-bit floating point numbers. Each time series has a name. For instance:

  • node_cpu_seconds_total — the total number of CPU seconds…

It looks like histogram support is great in Prometheus ecosystem:

But why Prometheus users continue complaining about issues in histograms? Let’s look at the most annoying issues.

Issue #1: metric range isn’t covered well by the defined histogram buckets

Suppose you decided covering response size with Prometheus histograms and defined the following histogram according to docs:

h := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "response_size_bytes",
Help: "The size of the response",
Buckets: prometheus.LinearBuckets(100, 100, 3),

This histogram has 4 buckets with the following response size ranges (aka le label value):

Recently the Evaluating Performance and Correctness article has been published by Prometheus author. The article points to a few data model discrepancies between VictoriaMetrics and Prometheus. It also contains benchmark results showing poor compression ratio and poor performance for VictoriaMetrics comparing to Prometheus. Unfortunately the original article doesn’t support comments to leave the response, so let’s discuss all these issues in the post below.

Bad compression ratio

This code has been used for generating time series for the benchmark. The code generates series of floating-point values with random 9 decimal digits after the point. Such series cannot be compressed well because 9 random…

Photo by Markus Spiske on Unsplash

Suppose you have a time series database containing terabytes of data. How do you mange backups for this data? Do you think it is too big to backup and blindly rely on database replication for data safety? Then you are in trouble.

Why replication doesn’t save from disaster?

Replication is the process of creating multiple copies of the same data on distinct hardware resources and maintaining this data in consistent state. Replication saves from hardware failures — if certain node or disk goes out of service, your data shouldn’t be lost or corrupted, since there should remain at least one copy of the data. Are we…

Thanos is known as long-term storage for Prometheus, while cluster version of VictoriaMetrics had been open sourced recently. Both solutions provide the following features:

  • Long-term storage with arbitrary retention.
  • Global query view over data collected from multiple Prometheus instances.
  • Horizontal scalability.

Let’s compare different aspects of Thanos and VictoriaMetrics starting from their architecture and then comparing insert and select paths by the following properties:

  • Setup and operational complexity
  • Reliability and availability
  • Consistency
  • Performance
  • Scalability

High Availability setups and hosting costs are highlighted in the end of the article.

The architecture

Thanos consists of the following components:

We are happy to announce that VictoriaMetrics enters open source world under Apache2 license!

What is VictoriaMetrics?

VictoriaMetrics is high-performance resource-efficient time series database with the following features:

Aliaksandr Valialkin

Founder and core developer at VictoriaMetrics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store