|

Master Your Alerts with Prometheus and Alertmanager

To ensure that applications and services run smoothly, collecting metrics alone is not sufficient. What really matters is being notified when something unusual happens in those metrics. This is where the Prometheus + Alertmanager duo comes into play. In this article, I will focus more on the alerts that can be defined directly in Prometheus. To help clarify the concepts, Alertmanager will be treated more as a bonus component.

At the end of the article, I’ll provide you with a hands-on lab environment you can try yourself. You’ll be able to spin it up with Docker Compose and run your own experiments.

Definitely that is not the Prometheus I am talking about.

What is Alertmanager? What does it do?

Prometheus evaluates metrics based on the rules you define and generates an alert when a threshold is exceeded. If you were watching the Prometheus Alerts page 24/7, you would probably catch those alerts. However, since this is not a realistic scenario, you’ll want to receive these alerts through a notification system. This is exactly where Alertmanager comes into play.

Responsibilities of Alertmanager:

  • Routing incoming alerts to the relevant people or services (Slack, email, PagerDuty, OpsGenie, Webhook, etc.)
  • Grouping similar alerts and sending them as a single notification
  • Suppressing certain alerts under specific conditions (e.g., if a server is completely down, a “low CPU usage” alert no longer makes sense)
  • Providing the ability to mute alerts (silences)

In short, Alertmanager can be thought of as an “alert router.”

Still, not that Prometheus.

How to create alerts with Prometheus rules? (A simple example)

For example, let’s say that for one of your applications, the total number of incoming HTTP requests over the last 2 minutes dropping below 100 is a critical condition, and you want to be notified about it. To do this, you first define a rule in Prometheus:

groups:
- name: webapi-rules
  rules:
  - alert: LowRequestRate
    expr: increase(http_request_count_total{instance=~"webapi.*"}[1m]) < 100
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Low request rate on {{ $labels.instance }}"
      description: "Instance {{ $labels.instance }} received fewer than 100 requests in the last 2 minutes."

In this scenario, an alert will be triggered if the request count for each instance whose name starts with webapi stays below 100 in 1-minute intervals for 2 minutes (for: 2m).

More Complex Scenarios

In the real world, simple “check a single threshold” scenarios are often not enough. This is where Prometheus’s powerful query language, PromQL, comes into play.

1. Checking request volume per instance

In a farm, you may have dozens of instances. To detect if traffic drops on just one of them, you can use:

sum by(instance) (increase(http_request_count_total{instance=~"webapi.*"}[2m])) < 100

Here, sum by(instance) evaluates each server separately.

2. Comparing windows with offset

Sometimes what matters is not the “absolute value,” but sudden changes. For example:

Trigger an alert if the number of requests in the last 2 minutes is less than 30% of the previous 2 minutes.

sum by (instance) (increase(http_request_count_total{instance=~"webapi.*"}[2m]))
<
0.3 *
sum by (instance) (increase(http_request_count_total{instance=~"webapi.*"}[2m] offset 2m))
I deliberately caused a sharp drop in traffic to trigger the SharpDropInRequests rule.

3. Filtering out down instances

If a server is completely down (i.e., the exporter is not responding), increase function may not produce any value. As a result, it might not match the earlier rules and no alert would be generated. You need to catch this with a separate rule:

up{instance=~"webapi.*"} == 0

And with an Alertmanager inhibition rule, you can suppress the “LowRequestRate” alert while “InstanceDown” is firing. This helps prevent alert noise and, especially in large systems, keeps root-cause analysis from becoming unnecessarily complicated.

I took down webapi2 instance and this is the result.

Usage of Alertmanager Webhook

You’ve written your Prometheus alert rules, and Alertmanager has received those alerts. So how will you get notified? One of the simplest methods is a webhook.

Alertmanager Configuration (Example)

route:
receiver: "webhook-demo"

receivers:
- name: "webhook-demo"
webhook_configs:
- url: "http://my-webhook:5001/alerts"
send_resolved: true

Below is a minimal Flask application that logs incoming alerts:

from flask import Flask, request

app = Flask(__name__)

@app.route("/alerts", methods=["POST"])
def alerts():
data = request.json
print("=== Gelen Alert ===")
print(data)
return "ok", 200

if __name__ == "__main__":
app.run(host="0.0.0.0", port=5001)

Example JSON Payload of Alertmanager Request to Flask

{
"status": "firing",
"receiver": "webhook-demo",
"groupLabels": { "instance": "webapi-2" },
"commonLabels": { "alertname": "InstanceDown", "instance": "webapi-2" },
"alerts": [
{
"status": "firing",
"labels": { "alertname": "InstanceDown", "instance": "webapi-2" },
"annotations": { "summary": "Instance webapi-2 is DOWN" },
"startsAt": "2025-09-19T15:30:45Z"
}
]
}
Alerts on My Webhook App

Conclusion

Alertmanager is essential in modern systems for reducing alert noise and delivering the right information to the right people at the right time. From simple threshold-based rules to complex scenarios that compare historical windows, you can define all kinds of checks on the Prometheus side and manage them with Alertmanager.

With the right rules and a well-tuned Alertmanager configuration, you can quickly detect critical issues and get rid of unnecessary notifications. 🚀

Github Link: https://github.com/idylmz/Prometheus-AlertManager-DemoLab

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *