Prometheus和Alertmanager配置教程
使用Docker Compose搭建Prometheus监控系统,并配置Alertmanager进行告警管理。
目录
- Docker Compose配置
- Prometheus配置
- Alertmanager配置
- 告警规则配置
Docker Compose配置
首先,使用Docker Compose来部署Prometheus、Grafana、Pushgateway和Alertmanager。
创建一个docker-compose.yml文件,内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
| version: "3" services: prometheus: image: prom/prometheus container_name: prometheus user: root ports: - "9090:9090" volumes: - ./conf/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - ./conf/rules:/etc/prometheus/rules command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/usr/share/prometheus/console_libraries' - '--web.console.templates=/usr/share/prometheus/consoles' networks: - net-prometheus
grafana: image: grafana/grafana container_name: grafana user: root ports: - "3000:3000" volumes: - ./data/prometheus/grafana_data:/var/lib/grafana depends_on: - prometheus networks: - net-prometheus
pushgateway: image: prom/pushgateway container_name: pushgateway user: root ports: - "9091:9091" volumes: - ./data/prometheus/pushgateway_data:/var/lib/pushgateway networks: - net-prometheus
alertmanager: image: prom/alertmanager container_name: alertmanager user: root ports: - "9093:9093" volumes: - ./conf/alertmanager:/etc/alertmanager - ./data/prometheus/alertmanager_data:/var/lib/alertmanager networks: - net-prometheus
networks: net-prometheus:
|
这个配置文件定义了四个服务:Prometheus、Grafana、Pushgateway和Alertmanager。建议使用固定版本镜像,配置相应的端口映射和卷挂载。
Prometheus配置
接下来,我们需要配置Prometheus。在对应映射目录创建prometheus.yml文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| global: scrape_interval: 5s evaluation_interval: 5s
external_labels: monitor: 'dashboard'
alerting: alertmanagers: - static_configs: - targets: - "alertmanager:9093" timeout: 30s
rule_files: - /etc/prometheus/rules/*.rules
scrape_configs: - job_name: 'prometheus' scrape_interval: 5s static_configs: - targets: ['prometheus:9090']
- job_name: "Windows" static_configs: - targets: ["your_windows_host:9182"]
|
这个配置文件定义了全局设置、告警管理器、规则文件位置和抓取配置。your_windows_host`你实际的Windows主机地址。
Alertmanager配置
在对应映射目录创建alertmanager.yml文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| global: resolve_timeout: 5m smtp_smarthost: 'smtp.example.com:465' smtp_from: '[email protected]' smtp_auth_username: '[email protected]' smtp_auth_password: 'your_smtp_password' smtp_require_tls: false
route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 3h receiver: 'email-notifications'
receivers: - name: 'email-notifications' email_configs: - to: '[email protected]' send_resolved: true
inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
|
请注意替换以下内容:
这个配置文件设置了全局SMTP配置、路由规则、接收器和抑制规则。
告警规则配置
最后,配置告警规则。创建windows_alerts.rules文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| groups: - name: windows_alerts rules: - alert: WindowsHighMemoryUsage expr: 100 - (windows_os_physical_memory_free_bytes / windows_cs_physical_memory_bytes) * 100 > 70 for: 1m labels: severity: warning annotations: summary: "Windows主机内存使用率高 (实例 {{ $labels.instance }})" description: "Windows主机 {{ $labels.instance }} 的内存使用率超过70%,当前值: {{ $value | printf \"%.2f\" }}%"
- alert: WindowsHighCPUUsage expr: 100 - (avg by(instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100) > 90 for: 5m labels: severity: warning annotations: summary: "Windows主机CPU使用率高 (实例 {{ $labels.instance }})" description: "Windows主机 {{ $labels.instance }} 的CPU使用率持续5分钟超过90%,当前值: {{ $value | printf \"%.2f\" }}%"
- alert: WindowsServerDown expr: up{job="windows"} == 0 for: 5m labels: severity: critical annotations: summary: "Windows服务器宕机 (实例 {{ $labels.instance }})" description: "Windows服务器 {{ $labels.instance }} 已经宕机超过5分钟"
|
这个文件定义了三个告警规则:
- 内存使用率高于70%
- CPU使用率持续5分钟高于90%
- Windows服务器宕机超过5分钟
总结
通过以上配置,就成功搭建一个基于Prometheus和Alertmanager的监控告警系统。系统可以监控Windows主机的状态,并在出现问题时发送邮件通知。
要启动整个系统,只需在包含docker-compose.yml文件的目录中运行:
之后,您可以通过以下地址访问各个服务: