Filter prometheus results by metric value, not by label value

If you're confused by brian's answer: The result of filtering with a comparison operator is not a boolean, but the filtered series. E.g.

min(flink_rocksdb_actual_delayed_write_rate > 0)

Will show the minimum value above 0.

In case you actually want a boolean (or rather 0 or 1), use something like

sum (flink_rocksdb_actual_delayed_write_rate >bool 0)

which will give you the greater-than-zero count.

This can be solved with subqueries:

count_over_time((metric > 0)[5m:10s])

The query above would return the number of metric data points greater than 0 over the last 5 minutes.

This query may return inaccurate results depending on the relation between the second arg in square brackets (aka step for the inner query) and the real interval between raw samples (aka scrape_interval):

If the step exceeds scrape_interval, them some samples may be missing during the calculations. In this case the query will return lower than expected result.
If the step is smaller than the scrape_interval, then some samples may be counted multiple times. In this case the query will return bigger than expected result.

So it is recommended setting the step to scrape_interval in order to get accurate results.

P.S. The issues mentioned above are solved in VictoriaMetrics - Prometheus-like monitoring system I work on. It provides count_gt_over_time() function, which ideally fits this case. For example, the following MetricsQL query returns the exact number of raw samples with values greater than 0 over the last 5 minutes:

count_gt_over_time(metric[5m], 0)

Filtering is done with the comparison operators, for example x > 0.

Filter prometheus results by metric value, not by label value

Tags:

Grafana

Prometheus

Related

Recent Posts