Elast Alert Spike Analysis

Elast Alert Spike can monitor and alert the peaks of events indexed in Elasticsearch. Spike's rules mainly need to be specified:

1) Monitor whether the log suddenly rises or falls (spike_type):

 

# (Required, spike specific)
# The direction of the spike
# 'up' matches only spikes, 'down' matches only troughs
# 'both' matches both spikes and troughs
spike_type: "down"

 

 

2) If the monitoring log rises (up), what is the peak value of the upper limit (threshold_cur)

3) If the monitoring log rises (up) and reaches the peak value of the upper limit (threshold_cur), how many times the current rise (spike_height) is raised, and an alarm is issued.

4) Because you want to compare the multiples of rising/falling before and after, you need to specify a time frame to calculate the number of events that occurred in the two periods of time.

 

Similarly, if we want to monitor the sudden drop of the log, in addition to specifying the type as down, we also specify a lower limit peak (threshold_ref), the multiple of the drop (spike_height) and the counted time period (timeframe).

 

When up, specify threshold_cur and spike_height

When down, specify threshold_ref and spike_height

 

The threshold_cur here can be understood as the current event, and the threshold_ref is the reference event. The reference event is of course an event segment before the current one for reference.

 

As shown in the figure below, spike_height = 2, the timeframe is 10 minutes, and there are two chestnuts up and down, respectively.



 

 

In example - up, because we want to monitor the peaks, we only need to specify threshold_cur = 20, no need to specify threshold_ref. If we have an event like this:

 

eg1:

14:00 -- 14:10     10

14:10 -- 14:20     15

The actual cur is 15 < threshold_cur(20), so there is no alarm.

 

eg2:

14:00 -- 14:10     10

14:10 -- 14:20     30

The actual cur(30) > threshold_cur(20), at this time, whether to alarm depends on whether the increase multiple (spike_height) is exceeded. The actual increase multiple is 30/10 = 3, which is greater than spike_height (2), so an alarm is required.

 

 In config.yml, the scan time I configured is

run_every:
  minutes: 1

 In the rule file, the configured time frame is 2 minutes

# (Required, spike specific)
# The size of the window used to determine average event frequency
# We use two sliding windows each of size timeframe
# To measure the 'reference' rate and the current rate
timeframe:
  minutes: 2

 

As you can see below, there will be a "Queried rule Event spike..." every minute, which means going to es to query events.

There will be an "Event spike from ... to .." every two minutes. This is how many hits, matches and alerts are sent in a timeframe at the same level.

INFO:elastalert:Queried rule Event spike from 2017-08-12 23:10 CST to 2017-08-12 23:11 CST: 0 / 0 hits
INFO:elastalert:Queried rule Event spike from 2017-08-12 23:11 CST to 2017-08-12 23:12 CST: 0 / 0 hits
INFO:elastalert:Ran Event spike from 2017-08-12 23:10 CST to 2017-08-12 23:12 CST: 0 query hits (0 already seen), 0 matches, 0 alerts sent
INFO:elastalert:Sleeping for 59.928103 seconds

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326336606&siteId=291194637