How to silence Prometheus Alertmanager using config files?
Well, I managed it to work by configuring a hackish inhibit_rule:
inhibit_rules:
- target_match:
alertname: 'CPUThrottlingHigh'
source_match:
alertname: 'DeadMansSwitch'
equal: ['prometheus']
The DeadMansSwitch
is, by design, an "always firing" alert shipped with prometheus-operator, and the prometheus
label is a common label for all alerts, so the CPUThrottlingHigh
ends up inhibited forever. It stinks, but works.
Pros:
- This can be done via the config file (using the
alertmanager.config
helm parameter). - The
CPUThrottlingHigh
alert is still present on Prometheus for analysis. - The
CPUThrottlingHigh
alert only shows up in the Alertmanager UI if the "Inhibited" box is checked. - No annoying notifications on my receivers.
Cons:
- Any changes in
DeadMansSwitch
or theprometheus
label design will break this (which only implies the alerts firing again).
Update: My Cons became real...
The DeadMansSwitch
altertname just changed in the stable/prometheus-operator 4.0.0. If using this version (or above), the new alertname is Watchdog
.
I doubt there exists a way to silence alerts via configuration (other than routing said alerts to a /dev/null
receiver, i.e. one with no email or any other notification mechanism configured, but the alert would still show up in the Alertmanager UI).
You can apparently use the command line tool amtool
that comes with alertmanager to add a silence (although I can't see a way to set an expiration time for the silence).
Or you can use the API directly (even though it is not documented and in theory it may change). According to this prometheus-users thread this should work:
curl https://alertmanager/api/v1/silences -d '{
"matchers": [
{
"name": "alername1",
"value": ".*",
"isRegex": true
}
],
"startsAt": "2018-10-25T22:12:33.533330795Z",
"endsAt": "2018-10-25T23:11:44.603Z",
"createdBy": "api",
"comment": "Silence",
"status": {
"state": "active"
}
}'
One option is to route alerts you want silenced to a "null" receiver. In alertmanager.yaml
:
route:
# Other settings...
group_wait: 0s
group_interval: 1m
repeat_interval: 1h
# Default receiver.
receiver: "null"
routes:
# continue defaults to false, so the first match will end routing.
- match:
# This was previously named DeadMansSwitch
alertname: Watchdog
receiver: "null"
- match:
alertname: CPUThrottlingHigh
receiver: "null"
- receiver: "regular_alert_receiver"
receivers:
- name: "null"
- name: regular_alert_receiver
<snip>