Evaluating Performance and Correctness: VictoriaMetrics response
Recently the Evaluating Performance and Correctness article has been published by Prometheus author. The article points to a few data model discrepancies between VictoriaMetrics and Prometheus. It also contains benchmark results showing poor compression ratio and poor performance for VictoriaMetrics comparing to Prometheus. Unfortunately the original article doesn’t support comments to leave the response, so let’s discuss all these issues in the post below.
Bad compression ratio
This code has been used for generating time series for the benchmark. The code generates series of floating-point values with random 9 decimal digits after the point. Such series cannot be compressed well because 9 random decimal digits require at least 9*log2(10)=30 bits
or 4 bytes
according to information theory. I.e. each sample in such series would occupy at least 4 bytes on disk not counting the encoded timestamp in any system — Prometheus, VictoriaMetrics, Thanos, Cortex, M3DB, InfluxDB or TimescaleDB.
VictoriaMetrics users report 0.4–0.6 bytes per sample for production data according to the following PromQL query for the exported metrics:
sum(vm_data_size_bytes{type=~”storage/.*”}) / sum(vm_rows{type=~”storage/.*”})
So how 4 bytes per sample from Brian’s benchmark can be converted to 0.4 bytes per sample for real-world data? The answer is:
- The time series from the benchmark are far from real world.
- Real-world measurements usually contain small number of decimal digits on limited range. For instance, the usual temperature range is from -460 F to 1000 F with 0.1 F precision, the usual speed range is from 0 m/s to 10K m/s with 1 m/s precision, the usual qps range is from 0 to 1M with 0.1 qps precision, the usual price range is from $0 to $1M with $0.01 precision.
- The number of decimal digits becomes even smaller after applying delta-coding, i.e. calculating the difference between adjacent samples. The difference is small for Gauges, since they tend to change slowly. The difference is small for Counters, since their rate is usually limited by relatively small range. The number of decimal digits for Counters can be reduced further by applying double delta-coding to them.
VictoriaMetrics takes full advantage of these properties for real-world time series. Try storing real-world series into VictoriaMetrics such as metrics from node_exporter and enjoy much better on-disk compression ratio than Prometheus, Thanos or Cortex could provide. VictoriaMetrics compresses real-world node_exporter
data up to 7x better than Prometheus — see this benchmark.
High-entropy data (aka random numbers with big number of decimal digits) has not so good compression ratio comparing to typical time series data. So it is recommended reducing the number of decimal digits for measurements stored in TSDB in order to improve compression ratio and reduce disk space usage. vmagent provides -remoteWrite.roundDigits
command-line flag, which allows reducing storage requirements for the data written to VictoriaMetrics.
It is unclear why Brian decided to use random series instead of real-world series for measuring compression ratio in the article.
Precision loss
The original article contains an example where VictoriaMetrics trims the last decimal digits for the go_memstats_gc_cpu_fraction
metric — Prometheus returns 0.0003083021835007435, while VictoriaMetrics returns 0.0003083021835007. As you can see, the last 3 out of 20 digits are missing. This is due to precision loss during the conversion of floating-point number to decimal number plus decimal exponent. All the calculations on floating-point numbers result in precision loss for the lowest digits — see this Wikipedia article for details. VictoriaMetrics performs the conversion in order to improve compression ratio for floating-point values with small number of significant decimal digits comparing to Gorilla compression — see this article for details.
Should VictoriaMetrics users worry about this? Mostly no, since:
- The precision loss can occur only on values with more than 12 significant decimal digits. Such values are rare in real world. Even summary counters for nanoseconds shouldn’t lose precision. Of course, if you work in NASA, then you would need up to 15 decimal digits :)
- Real-world measurements usually contain small number of precise leading decimal digits. The rest of digits are just noise, which has little value because of measurement errors. For instance, the mentioned above metric —
go_memstats_gc_cpu_fraction
— contains only 4 or 5 precise digits after the point — 0.00308 in the best case — all the other digits are just garbage, which worsens series compression ratio.
Did you know that Prometheus also loses precision? Try storing 9.234567890123009
to it. It will be stored as 9.234567890123008
. See the verification link. Prometheus, like any solution that works with float64
values, has precision loss issues — see this link.
Stale timestamps in /api/v1/query results
The article mentions that VictoriaMetrics adjusts the returned timestamp for /api/v1/query
results if it is closer than 1 minute to now
. This is by design — it prevents from returning incomplete data if certain Prometheus instances lag behind when sending data to VictoriaMetrics via remote_write API. The offset from now
can be configured via -search.latencyOffset
command-line flag:
-search.latencyOffset duration
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 1m0s)
Why Prometheus doesn’t have similar option? Because it controls data scraping — the data becomes visible for querying almost immediately after the scrape. VictoriaMetrics receives scraped data from Prometheus instances via remote_write API. The data can be delayed for extended periods of time before it becomes visible for querying in VictoriaMetrics. Additionally, non-zero -search.latencyOffset
allows avoiding issues related to query isolation.
VictoriaMetrics treats vector with a single nameless series as scalar and vice versa
The original article shows that VictoriaMetrics returns the expected result from the sum(time())
query, while Prometheus returns crypticexpected type instant vector in aggregation expression, got scalar
error. Another example from the article is PromQL query vector(0)-prometheus_tsdb_blocks_loaded
— Prometheus returns unexpectedly empty result, while VictoriaMetrics returns the expected negative value for all the prometheus_tsdb_blocks_loaded
series.
This deviation in behavior between Prometheus and VictoriaMetrics is deliberate — it simplifies using PromQL for users, who don’t know the difference between scalar, instant vector and range vector in Prometheus. Are there people except Prometheus developers who know the difference? :)
When developing PromQL-compatible engine for VictoriaMetrics I tried avoiding its rough edges based on my experience in order to make more user-friendly PromQL with expected behavior. For instance, VictoriaMetrics allows writing rate(q)
instead of frequently used rate(q[$__interval])
in Grafana dashboards. The full list of additional features is available on this page.
Note that VictoriaMetrics is fully backwards-compatible with PromQL, i.e. all the valid queries from Prometheus should return the same results in VictoriaMetrics. There are a few exceptions like more consistent handling for increase() function — VictoriaMetrics always returns the expected integer value for increase()
over counter without floating-point increases. See also VictoriaMetrics: PromQL compliance article for more details on intentional discrepancies between PromQL and MetricsQL.
Staleness handling
Brian mentions in the article that VictoriaMetrics drops all the incoming NaN values and this breaks staleness handling. This handling is quite complicated and I bet nobody except of a few Prometheus developers know all its details and corner cases. I’ll try explaining simplified staleness handling for Prometheus as I understood it. Prometheus stops returning data points for time series as soon as one of the following conditions are met:
- After a special NaN value is found. This value is inserted by Prometheus when the metric disappears from the scrape target or the scrape target is unavailable.
- After 5 minutes of silence from the previous value.
This logic doesn’t work on time series with scrape intervals exceeding 5 minutes, since Prometheus mistakenly thinks the time series contains a gap in 5 minutes after each scrape.
VictoriaMetrics drops NaNs, but it still detects stale series with much easier logic without corner cases — it stops returning data points if they are missing during the last 2 scrape intervals. For instance, if Prometheus scrapes time series every 10 seconds, then VictoriaMetrics detects that the series is stale after 20 seconds of missing data points.
Contrary to Prometheus, staleness handling in VictoriaMetrics correctly handles time series with scrape intervals higher than 5 minutes. Additionally, it works the same not only for Prometheus remote_write API, but for any other supported ingestion methods — Graphite plaintext protocol, InfluxDB line protocol, OpenTSDB telnet and http protocols. These ingestion methods don’t know anything about Prometheus staleness detection, so, obviously, they don’t work with it.
Update: VictoriaMetrics gained Prometheus-compatible staleness handling in latest releases — see the changelog for details.
Slow time series lookups
The initial version of the article outlined that VictoriaMetrics is more than an order of magnitude slower than Prometheus on time series lookups in this micro-benchmark from Prometheus source code. This was unexpected. The article didn’t contain source code for the corresponding benchmark in VictoriaMetrics, so I implemented it in VictoriaMetrics sources. The end result is quite different — VictoriaMetrics outperforms Prometheus in all the benchmarks by 5x-30x while using 3.5x less RAM. Feel free re-running these benchmarks with the following code from root folder of Prometheus and VictoriaMetrics source codes:
- Prometheus:
GOMAXPROCS=1 go test ./tsdb/ -run=111 -bench=BenchmarkHeadPostingForMatchers
- VictoriaMetrics:
GOMAXPROCS=1 go test ./lib/storage/ -run=111 -bench=BenchmarkHeadPostingForMatchers
Brian updated performance numbers in the original article after I pointed him to real numbers via Twitter. Now the article claims VictoriaMetrics is 2x-5x slower than Prometheus on the modified end-to-end tests. Unfortunately Brian didn’t provide source code for the updated tests yet. The source code is required in order to reproduce the test and to determine why VictoriaMetrics is slower than Prometheus in these tests.
Usually VictoriaMetrics is much faster than competitors (InfluxDB and TimescaleDB) — see this article for details. VictoriaMetrics users report it is faster on real production data than Prometheus, Cortex and Thanos. They also report that VictoriaMetrics consumes lower amounts of RAM and disk space comparing to Prometheus, Cortex and Thanos. As I know, VictoriaMetrics users never return back to Thanos and Cortex. Additionally, they were frequently requesting to create stripped-down Prometheus without local storage, since Prometheus instances usually eat too much RAM in their highly loaded setups. So we created vmagent.
Conclusion
Don’t trust articles on the internet, including this one, since they may be biased. Just give a try to VictoriaMetrics — it is easy to configure and it can run in parallel with other long-term storage solutions for Prometheus such as Thanos or Cortex. Then decide whether it is better than Thanos or Cortex. I’d recommend reading this article, which compares Thanos to VictoriaMetrics from various PoVs: operational complexity, reliability, performance, cost, etc.
Contrary to the original article, this post can be commented below, so feel free leaving comments and questions.
P.S. Join our Slack channel and keep up to date with all the news regarding VictoriaMetrics.
P.S. See also this VictoriaMetrics vs Prometheus benchmark, which is based on real production data.