Prometheus a process alarm is triggered, the waiting time

Prometheus a process alarm is triggered, the waiting time

Alarm process is as follows:
1. Prometheus Monitoring Server http exposed on the target host interface (the interface is assumed here A), arranged by the above Promethes 'scrape_interval' defined time intervals, periodically collect monitoring data on the target host.
2. When the interface is unavailable when A, Server side will continue to try to take the data from the interface, until "scrape_timeout" stop trying time. At this time the status of the interface changes to "DOWN".
3. Prometheus while the configuration of "evaluation_interval" time interval, regularly (default 1min) to assess the Alert Rule; when reaching the evaluation period, the interface A was found to DOWN, the true i.e. UP = 0, activates the Alert, into the "PENDING" state, and record the current active time;
4. the next alert rule evaluation period comes, to continue to find UP = 0 is true, then determines whether the alarm time has exceeded Active in duration rule 'for' If not exceeded, the process proceeds to the next evaluation period; if the time is exceeded, the alert status changes to "FIRING"; Alertmanager call interfaces simultaneously transmitting data related alarms.
5. After AlertManager receiving the alarm data, alarm information will be grouped, and then waits first configuration according alertmanager "group_wait" time. Other again after the wait time to send alarm information.
6. Alert Group belonging to the same alert, waiting for the process may be entering a new alert, if the previous alarm has been successfully sent, then the interval "group_interval" time interval before re-sending alarm information. For example, to configure a mail alarm, then belong to a group of alarm information will be sent a summary of the mail.
7. If the alert in the Alert Group has not changed and has been successfully transmitted, wait 'repeat_interval' repeat the same alarm message after the transmission time interval; if the alarm is not successfully transmitted before, the article 6 corresponds trigger condition, group_interval wait time interval is repeatedly transmitted.
As for the same time last warning information specific to whom, alert recipient specified under what conditions, set different alarm transmission frequency, there are route routing rules alertmanager be configured.

 

1, the waiting time 1

View Profile: vim prometheus.yml

Global :
   # data collection interval 
  scrape_interval: 15s 
   # Evaluation alarm period 
  evaluation_interval: 15s 
  # Data Acquisition default timeout 10s 
 # scrape_timeout

 

2, waiting time 2

Profile: vim alertmanager.yml

# Route tag: How alarm transmission assignment 
route:
   # GROUP_BY: using a packet based on which tab 
  GROUP_BY: [ ' AlertName ' ]
   # group_wait: packet waiting time 
  group_wait: 10s
   # group_interval: sending an alarm two vertical interval 
  group_interval: 10s
   # REPEAT_INTERVAL: repeatedly sent warning time. Default IH 
  REPEAT_INTERVAL: 1M
   # Receiver who defined the alarm notification 
  Receiver: ' mail '

 

Guess you like

Origin www.cnblogs.com/xiangsikai/p/11289966.html