Improving histogram usability for Prometheus and Grafana

Issue #1: metric range isn’t covered well by the defined histogram buckets

h := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "response_size_bytes",
Help: "The size of the response",
Buckets: prometheus.LinearBuckets(100, 100, 3),
})
  • (0…100] aka le=”100"
  • (0…200] aka le=”200"
  • (0…300] aka le=”300"
  • (0…+Inf] aka le=”+Inf”
Buckets: [10, 20, 40, 80, 100, 200, 300]

Issue #2: too many buckets and high cardinality

  • High RAM usage, since TSDB usually keeps meta-information about each time series in RAM. See, for example, Thanos Store Gateway high RAM usage and OOMs.
  • High disk space usage, since each time series data requires additional disk space. See, for example, high disk space usage in Thanos Compactor.
  • Slower performance for inserts, since TSDB must perform more bookkeeping when storing samples for higher number of active time series.
  • Slower performance for selects, since each query would process more time series for higher number of buckets before returning the result.

Issue #3: incompatible bucket ranges

  • Response time for public pages — response_time_seconds{zone=”public”} with the following buckets: [0.1, 0.2, 0.3, 0.4, 0.5], since it is expected that publicly-facing pages must be returned in less than 0.5 seconds.
  • Response time for admin pages — response_time_seconds{zone=”admin”} with the following buckets: [0.5, 1.0, 1.5, 2.0, 5.0], since admins aren’t so sensitive to response times comparing to public page visitors.
histogram_quantile(0.95,
sum(rate(
response_time_seconds_bucket{zone=~"public|admin"}[5m]
)) by (le)
)

The solution — VictoriaMetrics histogram

  • There is no need in thinking about bucket ranges and the number of buckets per histogram, since buckets are created on demand.
  • There is no need in worrying about high cardinality, since only buckets with non-zero values are exposed to Prometheus. Usually real-world values are located on quite small range, so they are covered by small number of histogram buckets.
  • There is no need in re-configuring buckets over time, since bucket configuration is static. This allows performing cross-histogram calculations. For instance, the following query should work as expected in VictoriaMetrics starting from release v1.30.0:
histogram_quantile(0.95,
sum(rate(
response_time_seconds_bucket{zone=~"public|admin"}[5m]
)) by (vmrange)
)

VictoriaMetrics histogram internals

  • Times from nanoseconds to billions of years.
  • Sizes from 0 bytes to 2⁶⁰ bytes.
// create histogram for response sizes
var responseSize = metrics.NewHistogram("response_size_bytes")
...
func sendResponse(w io.Writer, response []byte) {
w.Write(response)
// Register the response size in histogram
responseSize.Update(float64(len(response)))
}

Bonus: using the power of heatmaps in Grafana

  • prometheus_buckets() function for converting VictoriaMetrics histogram buckets to Promethues-compatible buckets with le labels, which can be built with Grafana heatmaps:
prometheus_buckets(
sum(rate(
vm_http_request_duration_seconds_bucket
)) by (vmrange)
)
  • buckets_limit() function, which limits the number of buckets per metric. This function may be useful for building heatmaps from big number of buckets. The heatmap may be difficult to read if the number of buckets is too big. Just wrap the result into buckets_limit(N, ...) in order to limit the number of buckets to N. For example, the following query would return up to 10 resulting buckets for heatmap in Grafana:
buckets_limit(10, sum(rate(vm_http_request_duration_seconds_bucket)) by (vmrange))
  • histogram() aggregate function for building Prometheus-style histogram buckets from a set of time series. For example, the following query:
histogram(process_resident_memory_bytes)
histogram(process_resident_memory_bytes)
process_resident_memory_bytes

Conclusion

  • histogram_over_time(m[d]) for calculating histogram for gauge values m over the given time window d.
  • histogram_share(le, buckets) for calculating service level indicator (aka SLI / SLO). It returns the share (phi) for bucket values that don’t exceed the given threshold le. This is the inverse of histogram_quantile(phi, buckets).
  • share_le_over_time(m[d], le) for calculating SLI / SLO / SLA for gauge values m over the given time window d, that don’t exceed the given threshold le.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aliaksandr Valialkin

Aliaksandr Valialkin

Founder and core developer at VictoriaMetrics