When size matters — benchmarking VictoriaMetrics vs Timescale and InfluxDB

Recently Timescale published Time Series Benchmark Suite (TSBS) — a framework for TSDB benchmarking. See TSBS on GitHub.

The TSBS may:

  • generate the configured number of production-like timeseries;
  • measure insert performance for the generated timeseries;
  • measure select performance for various production-like queries.

The original TSBS supports the following systems:

  • Timescale
  • InfluxDB
  • MongoDB
  • Cassandra

Adding VictoriaMetrics to TSBS

The root cause was remote read API. TSBS had been configured to query Prometheus, which, in turn, queried VictoriaMetrics via remote read API. This didn't scale well, since VictoriaMetrics had to prepare and return huge amounts of data to Prometheus on heavy queries like double-groupby.

The solution was to create a PromQL engine directly in VictoriaMetrics, so all the heavy-lifting on complex queries could be implemented and optimized inside the engine. The end result is Extended PromQL engine with full PromQL support plus additional useful features like WITH expressions.

Benchmark preparation

The remaining competitors — Timescale and InfluxDB.

The following TSBS queries couldn’t be translated to PromQL, so they have been dropped from the benchmark:

  • lastpoint - PromQL cannot return last point for each time series;
  • groupby-orderby-limit - PromQL doesn't support order by and limit.

The high-cpu queries have been modified to return the max(usage_user) for each host, since PromQL doesn't support SELECT *. The cpu-max-all queries have been dropped, since they weren't present in benchmark results from Timescale.

The benchmark was run in Google Compute Engine on two n1-standard-8 instances with 8 virtual CPUs, 30GB RAM and 200GB HDD - an instance for the client (TSBS), and an instance for the server. Timescale version - 0.12.1, InfluxDB version - 1.6.4.

Benchmark results

  • VictoriaMetrics — 1.7M datapoints per second, RAM usage — 0.8GB, data size on HDD — 387MB.
  • InfluxDB — 1.1M datapoints per second, RAM usage — 1.7GB, data size on HDD — 573MB.
  • Timescale — 890K datapoints per second, RAM usage — 0.4GB, data size on HDD — 29GB.

Nothing interesting except Timescale data occupies whopping 29GB on HDD. That’s 50x more than InfluxDB and 75x more than VictoriaMetrics. Later we’ll see when this size matters.

Select performance:

  • VictoriaMetrics wins InfluxDB and Timescale in all the queries by a margin of up to 20x. It especially excels at heavy queries, which scan many millions of datapoints across thousands of distinct timeseries.
  • InfluxDB is on the second place. It wins Timescale on light queries and looses Timescale by up to 3.5x on heavy queries.
  • Timescale is on the third place. Moreover, it was multiple orders of magnitude slower on all the queries when the required data wasn’t in page cache, while VictoriaMetrics and InfluxDB were only marginally slower in these cases.

See full benchmark results.

Analysis

The read throughput limit has been reached by Timescale only a few times during select queries. The rest of time it was limited by 150 read operations per second. This points to sub-optimal data layout for low-iops storage such as HDD.

Any workarounds for Timescale? The easiest workaround is to use more expensive storage with high bandwidth and high iops such as high-end SSD. Post other workarounds in comments.

Conclusion

TSBS is a great benchmarking tool. It helped minimizing CPU usage and RAM usage for VictoriaMetrics on production workloads. We are planning to run benchmarks and publish results for higher cardinality (millions of unique timeseries) and higher number of datapoints (trillions). Stay tuned.

In the mean time read how we created VictoriaMetrics — the best remote storage for Prometheus.

Reddit thread

HackerNews thread

Update: Docker images with single-server VictoriaMetrics are available here. The corresponding statically linked binaries are available here.

Update#2: Read the next article — High-cardinality TSDB benchmarks: VictoriaMetrics vs TimescaleDB vs InfluxDB.

Update #3: Read yet another article — Measuring vertical scalability for time series databases in Google Cloud.

Update #4: VictoriaMetrics is open source now!

Founder and core developer at VictoriaMetrics