(19) Use InfluxDB to build an alarm system

The following content comes from Shang Silicon Valley. I wrote this series of articles mainly to facilitate my own subsequent viewing. I don’t have to carry around a PDF to find it, which is too troublesome!

Chapter 19 Using InfluxDB to build an alarm system

19.1 What is monitoring

1. Monitoring actually calculates the data every once in a while. For example, I have a carbon monoxide concentration sensor, and every 1 minute I calculate the average of the indoor carbon monoxide concentration within this minute. Compare this result with a hard-coded standard value and alarm if it exceeds it. This is the basic logic of monitoring.

2. Therefore, monitoring in InfluxDB is actually a scheduled task written in a FLUX script. However, whether it is on the HTTP API or the Web UI, InfluxDB treats it separately from scheduled tasks.

19.2 Understand inspections, alarm terminals and alarm rules

1. In the left toolbar of the Web UI, click the Alerts button to open an alarm configuration page. The upper option bar displays CHECKS (checks), NOTIFICATION ENDPOINTS (alarm terminals) and NOTIFICATION RULES (alarm rules), which respectively correspond to the three components required by InfluxDB for alarming.

Insert image description here
2. The functions of the three components are as follows:

  • CHECKS: It is actually a scheduled task, we can call it a check task. The check task will read part of the data from the target bucket and then perform a threshold check, and finally generate a Type 4 signal. CRIT (critical), WARN (alert), INFO (information) and OK (good).

Insert image description here

  • NOTIFICATION ENDPOINTS (alarm terminal): It is a component that sends alarm signals to specified addresses.
  • NOTIFICATION RULES (Alarm Rules): It can specify which Checks will send a WeChat alarm if there is a problem, and which Checks will send an email notification if there is a problem. It is equivalent to the routing between Check and alarm terminal.

19.3 Example: Simulating alarms for carbon monoxide concentrations

19.3.1 Requirements

1. Suppose we now have a sensor that can collect carbon monoxide concentration. This sensor inserts a piece of data through the IoT network to InfluxDB deployed on the server every once in a while. The format is as follows.

co,code=01 value=0.001 1664851126000

2. Now, we hope to use InfluxDB to complete the following alarm function.

  • A CRIT (critical) level notification signal is issued when the CO concentration is greater than 0.04.
  • A WARN (alert) level notification signal is issued when the CO concentration is between 0.04 and 0.01.
  • When the CO concentration is lower than 0.01, an OK level notification signal is issued.

3. Ultimately, when the CO concentration exceeds the standard, we hope that the relevant staff will receive a call so that they can respond quickly to the accident.

19.3.2 Auxiliary tools

1. In order to facilitate the verification of the effect of the alarm terminal, a very simple HTTP service that only supports POST requests is written here. Directly use the following command to start a POST HTTP service listening on the local port 8080.

./simpleHttpPostServer-linux-x64

2. This command will block the terminal after execution. When it receives the POST request, it will automatically print the content in the request body to the terminal. If 0.0.0.0 is not the host you want to bind or port 8080 is already occupied. You can modify it using the following two parameters.

  • h specifies the bound host
  • p specifies the bound port

3. For example:

  ./simpleHttpPostServer-linux-x64 - h localhost -p 8080

4. For details, please refer to the project address: https://github.com/realdengziqi/simpleHttpPostServer

19.3.3 Create a new bucket

1. In order to avoid confusing the data of this example with the previous example, here we first create a new bucket named example_alert, as shown in the following figure:

Insert image description here

19.3.4 Preparing data templates

1. In this example, we will manually insert data into InfluxDB one by one, so we can open a text editor (this tutorial uses vs code) to first write a data template for the InfluxDB row protocol, and then directly copy the data. Change the value slightly and then insert it. The data template is as follows:

co,code=01 value=0.001

19.3.5 Insert one or two pieces of data in advance

1. This step is to ensure that there are options in the query constructor when creating a check later. Therefore, in order to create the check smoothly, this step cannot be omitted. Here, we import one piece of data each time on the Web UI window for importing row protocol data. as the picture shows:

Insert image description here
2. The data is as follows:

  • first
co,code=01 value=0.0015
  • the second time
co,code=01 value=0.0025

19.3.6 Creating a check (CHECK)

1. Click the Alerts button in the toolbar on the left. By default, you will enter the CHECKS page, as shown in the figure:
Insert image description here
2. Hover the mouse over the CREATE button in the upper right corner, and a drop-down menu will pop up, including two buttons:

  • THRESHOLD CHECK: This type of check task is mainly to determine whether the data exceeds a certain threshold limit.
  • Deadman Check: This type of check task is to determine how long it has been since new data was written in a certain sequence. You can also set a value, such as sending a warning signal once no data for a certain sequence has been entered into the database for more than 30 seconds.
    Insert image description here

3. Here, we select Threshold Check to create a threshold check. A dialog window will pop up later. Its layout is very similar to that of Data Explorer, but there will be some differences in functionality.

Insert image description here

  • There is a Name this Check at the top. Click to name the currently created Check.
  • There is a tab in the upper left corner, and DEFINE QUERY is selected by default, which is the page effect shown in the picture above.
  • At the bottom of the page is a query builder. It should be noted that we cannot switch to the script editor here, that is, here, we can only use the query builder to Implement the query.
  • There is also a list on the far right. As mentioned above, in order to create a threshold check, you must choose:
    • a field
    • An aggregate function (that is, the aggregate function after windowing)
    • One or more ranges.

4. Now, we need to construct the query. As shown below.

Insert image description here

  • Select example_alert at the bucket
  • Select co at _measurement
  • Note that although we currently have only one sequence under this measurement, _field=value must still be added to the filter condition, otherwise the One Field check item in the upper right will not pass.
  • Finally, change the aggregation logic from the default mean to max.
  • Click submit to preview the query effect of the data.

5. Click the CONFIGURE CHECK button in the upper left corner. This will take us to a new page where we can configure the threshold. The first thing to notice is that only the bottom half of the page has changed.

Insert image description here

  • The leftmost card corresponds to further configuration of query and scheduling. Here we set Schedule Every to 15 s so that the check will be called every 15 seconds.
  • The STATUS MESSAGE TEMPLATE in the middle is the status message template. Shell-style value syntax is supported here. ${}. The meaning of r here will be explained in detail later. Keep the default template here without any modifications.
  • The THRESHOLDS on the far right corresponds to the setting of the value range. There are 4 types of value fields here, corresponding to the 4 status messages that a check can emit. they are, respectively:
    • CRIT (the first 4 letters of critical) means critical emergency.
    • WARN (warning) means warning, warning
    • Info (Information) represents general information, reminder
    • ok means in good condition

6. At this time, click the CRIT button in the lower right corner, and a small setting window will pop up, as shown in the figure below
Insert image description here
Insert image description here
7. What this means is that when the value is greater than Set the status of the check to CRIT. Here, the is above on the right side of When value means greater than. You can see that this is still a drop-down menu. We can click on that. You will find that it has more optional options, including is below (less than), is inside range (within what range), etc.

Insert image description here

8. 0.00125 is automatically filled in by the Web UI based on our current query results. Here, according to our needs, the status is set to CRIT only when the concentration value of co is greater than 0.04. The effect is as follows.

Insert image description here
9. In the same way, if Warn and ok are set, Info will not be set. The results are as follows.
Insert image description here
10. Finally, click the check mark in the upper right corner to save the Check. Now, we are back to the original CHECKS page, and we can see that there is a Check that we just configured in the list below.

Insert image description here

19.3.7 Test Check

1. Now, you can go back to the page for uploading data and try to insert two pieces of data to test the running effect of the check. The inserted data is as follows:

co,code=01 value=0.025

Insert image description here

2. At this time, the concentration of carbon monoxide is 0.025, between 0.01 and 0.03. At this time, the CHECK we just created should send out a WARN level signal. Now, we can click Alert History in the toolbar on the left. As you can see, there is a notification with level WARN in our status record. Here, the MESSAGE on the right shows Check:CO_Alert is : warn, which is the message generated by our message template.

Insert image description here

19.3.8 Modify message template

1. Currently, the message prompted by our message template is not accurate enough. We hope that the current carbon monoxide concentration value can also be output when alarming. At this time, you can take a look at what the official documents say about templates. You can find that the official documents point out that we can access the specific value of the data through the r. field name.

Insert image description here
2. In this case, we can re-modify the message template. The final message template is as shown below:

Insert image description here
3. Pay attention to r.code and r.value in the template. Through this operation, we can directly extract the device number and current carbon monoxide concentration value in the data.

19.3.9 Validation message template

1. Next, we insert a piece of data again.

co,code=01 value=0.0146

2. 0.0146 is between 0.01 and 0.03. The CHECK we created before should send out another WARN level signal. Now, click Alter History on the left toolbar to view the status record of the inspection report. We found that the MESSAGE in the new status record has changed. This time we can see the device number and the carbon monoxide concentration at that time in the message.
Insert image description here

19.3.10 Create alarm terminal (NOTIFICATION ENDPOINT)

1. Status recording alone is not enough. We also need to send information to external systems, such as sending emails to developers or making phone calls. Then the component responsible for sending messages to the outside is the alarm terminal.

2. First click the Alerts button in the left workbar. After entering the page, select the NOTIFICATION ENDPOINTS tab in the upper toolbar.

Insert image description here
3. Click the CREATE button in the upper right corner, and a dialog window as shown below will pop up.

Insert image description here
4. There is a drop-down menu for Destination in the upper left corner. This is actually the type of alarm terminal. You can see that we are provided with 3 terminals here, HTTP, Slack and Pagerduty. Slack and Pagerduty are commonly used communication software by overseas development teams. Here we choose HTTP.

Insert image description here

5. After selecting HTTP, you can see that the configuration items in the window will change. The so-called HTTP terminal is actually sending a POST request to a target address.
Insert image description here

6. We will not connect to Ruixiang Cloud for the time being. Instead, we will try to find a way to observe the data structure of the data sent by the HTTP terminal. Here, our helper tool SimpleHttpPosyServer is found in the package. After copying it to the Linux virtual machine, execute the following command.

./simpleHttpPostServer-linux-x64

7. After execution, the program will monitor the address 0.0.0.0:8080. When it receives a POST request, it prints the received data to the terminal. Now, we can set the HTTP terminal address in InfluxDB to http://127.0.0.1:8080. As shown in the figure below:
Insert image description here
8. Finally, click CREATE NOTIFICATION ENDPOINT in the lower right corner to create a terminal.

19.3.11 Create alarm rules (NOTIFICATION RULES)

1. Alarm rules play the role of routing between alarm information and terminals. Alarm rules can specify which Check and what level of information is sent to which terminal.

2. Attention! The prerequisite for creating an alarm rule is that at least one alarm terminal has been created. Otherwise, the Create Alarm Rule button on the Web UI will turn gray, which means that the alarm rule cannot be created.

3. First click the Alerts button on the left toolbar, and then click NOTIFICATION on the upper tab. Then click the CREATE button.
Insert image description here
4. Now you can see a pop-up window for setting alarm rules. As shown below.

Insert image description here
5. The scheduling time can be set at the top, which looks like a scheduled task. Conditions in the middle means conditions. For example, the current default condition is that when the status of CHECK in InfluxDB is CRIT, http_endpoint is used to send alarm information. It should be noted that there is also a button in the middle Conditions called tag Filter, which is to filter according to tags. First, in order to see the alarm effect faster, we set the scheduling time to 15 seconds. You can choose whatever name you want. Results as shown below.

Insert image description here
6. In the middle Conditions area, click the Tag Filter button, and then add a tag filter condition of _check_name == CO_Alert. Where CO_Alert is the name of the check we created earlier. We'll talk about how inspection and notification rules work later. Here we set it up like this first, and the effect after setting is as shown below.

Insert image description here

7. In the Message area at the bottom of the window, because we currently only have one terminal named http_endpoint, the UI here automatically helps us select http_enpoint, just keep the status quo.

Insert image description here
8. Finally, click the CREATE NOTIFICATION RULE button at the bottom to create the rule.

Insert image description here

19.3.12 Test the alarm signal sending effect

1. Now, we want to test the alarm link that has been established in InfluxDB. Simply insert a simulated carbon monoxide concentration value greater than 0.04. The inserted data is as follows:

co,code=01 value=0.05

2. As shown in the figure:

Insert image description here
3. Next, click Alert History on the left toolbar to come to the alarm history page and wait for about 15 seconds. Under normal circumstances, a status message with a level of crit should appear in the check status history.

Insert image description here
4. Click the NOTIFICATIONS button above. A notification record should appear. This list contains records of notifications sent out by InfluxDB. You can see that there is a green check mark on the far right of this record, which indicates that our message has been successfully sent through http_endpoint.

Insert image description here
5. Go back to the terminal where simpleHttpPostServer was opened before and take a look at the contents. As shown in the figure, we successfully received a POST request and printed the data in the request body on the console.

Insert image description here

6. The effect after formatting this json data is as follows. You can see that it contains the time of the data, the alarm message, the alarm level, the carbon monoxide concentration value at the time of the incident, etc. If you can see the final json in the terminal, it means that the alarm configuration of InfluxDB has been completed and can work normally.

Insert image description here

19.3.13 How inspection and alarm rules work

1. When we set up alarm rules before, we found that configuring alarm rules requires setting the scheduling time interval. It feels a bit strange. Why does a rule need to be executed every once in a while? Let’s start with how inspection works.

2. After InfluxDB is installed, there will be a bucket named _monitoring automatically created by InfluxDB. We can use DataExplorer to query its contents. The query results are as follows:

Insert image description here
3. You can see that the query results include the alarm information generated by the Check task. For example, there is a field named _check_name and the value is CO_Alert. This is the check name we set before. That is to say, the Check we execute regularly actually queries the data from example_alert regularly, then performs a threshold check on it, and finally writes the checked status information to the _monitoring bucket. Results as shown below:

Insert image description here
4. In fact, the following notification strategy is also a scheduled task. It queries the data of the most recent period from _monitoring and filters the data according to the conditions you set. Finally, if there is data that meets the requirements, use our http_endpoint to transfer the data. Send it in json format. The final entire process is shown in the figure below.

Insert image description here
5. This is how Check, Notification rule and Notification endpoint work together.

19.4 Example: Integrating Ruixiang Cloud (Saas solution for alarm system)

19.4.1 What is Ruixiang Cloud

1. Ruixiang Cloud is an alarm platform that provides a variety of alarm methods. You can recharge, choose to call the police, follow the instructions to configure, and then you will get an API interface. In the future, when your system needs to alert the police, you only need to send an http request to this API in the code, and Ruixiang Cloud will make a call according to the phone number you configured, and the voice will remind the programmer that it is time to work overtime.

19.4.2 Register Ruixiang Cloud

1. Official website address: https://www.aiops.com/
2. Registration process (omitted)

19.4.3 Create your own alarm API

19.4.4 Create alarm API on Ruixiang Cloud

1. First, enter the homepage of Ruixiang Cloud and click the Intelligent Alarm Platform button on the left to enter the work page of the Intelligent Alarm Platform.
Insert image description here
2. As shown in the figure below, click the integration button on the upper tab to enter the integration configuration page

Insert image description here
3. At this time, there is a list of monitoring tools on the left. You can see that Ruixiang Cloud can be integrated with many monitoring tools, but there is no InfluxDB in this list. At this time, there is a universal integration solution. REST API. The REST API will provide a URL to the outside. As long as your monitoring tool can send a POST request to Ruixiang Cloud in the data format required by the API, it can be integrated with Ruixiang Cloud.

Insert image description here
4. At this time, we will enter a configuration page. First you need to set an application name. Then click the blue button below to save and obtain the application key.

Insert image description here

5. At this time, a line of red text will appear on the page, this is Appkey. Be careful not to leak this key.

Insert image description here
6. At this point, our alarm API has been configured.

19.4.5 Creating a Dispatch Policy

1. Now the outside can send alarm information to Ruixiang Cloud through the interface, but how does the Ruixiang Cloud platform send the alarm information to specific individuals. Forwarding alarm information to specific people is a process called dispatch.

2. Return to the home page of the alarm platform, click the Configuration button at the top, and then click the New Dispatch button on the right below.

Insert image description here

3. At this time, a new configuration page will be entered. You can operate according to the instructions in the figure below. Note that if nothing is displayed when setting the assignee here, it means that your account has not yet been bound to an email address. At this time, please bind the email address yourself before proceeding with the subsequent operations. After configuring, click Save.

Insert image description here
4. Now, once our TICK_TEST API receives the alarm information, it will notify the specific person.

19.4.6 InfluxDB attempts to connect with Ruixiang Cloud

1. Now, we can try to connect with Ruixiang Cloud. Now it seems that we only need to make the notification data sent by InfluxDB comply with the requirements of Ruixiang Cloud API. We can look back at the documentation for the API just created in Ruixiang Cloud (in Integration->Application List on the right->Find the REST API application you created yourself->Click Edit-> Visible at the bottom of the page) as follows As shown in the figure, this explains what format of data we should send.

Insert image description here

19.4.7 Shortcomings of alarm terminals

1. Now we return to the Alerts page of the Web UI and click on the edit page of http_endpoint. You will find that there is no place to modify the format of the sent data. Yes, InfluxDB's alarm terminal cannot set the data format to be sent. So at this point, all our efforts have been wasted. But there is another solution here. In the next section, we will directly touch the bottom layer of inspection and alarm.

Insert image description here

19.5 Example: Notebook and Alarm Bottom Layer

1. Before, we said that you can also create alarm tasks using Notebook, but we have never touched Notebook since then. In this section, we directly use Notebook to get into the bottom layer of alarm.

19.5.1 Use Notebook to create alarm tasks

1. First, click the Notebooks button on the left to come to the Notebooks configuration page. Then, click the Set an Alert template to create a new notebook.

Insert image description here
2. After entering Notebook, you will find that the first Cell is a query constructor. Here, we set the bucket to example_alert, _measurement to co, and _field to value. Results as shown below:

Insert image description here
3. Click the RUN button above to view the execution effect. You can see the two cells below. One cell displays the queried data as it is, and the other
draws the data into a line chart.

Insert image description here
4. Finally, there is a cell below, with the name of this cell in the upper left corner, New Alert. This means that this cell is used to configure alarms.

Insert image description here
5. There are two blocks at the top, one is used to set alarm conditions, and the other is used to set scheduling intervals. Here, we still set the alarm threshold to 0.04 as required. You may find that we can only set one alarm threshold here, missing crit, warn, info and ok. We will mention this issue later. The effect after setting is shown in the figure below.

Insert image description here
6. Look at the bottom of this cell again. This is the configuration of the alarm terminal. Here, we still choose the http terminal. And set the target URL to http://host1:8080.

Insert image description here
7. After the above operations are completed, click the EXPORT ALERT TASK button on the lower right.

Insert image description here

8. We will be pleasantly surprised to find that Notebook directly generates a long FLUX script for us. As shown below. Now, it is recommended that you copy the script and paste it into Data Explorer. Later, we will study this script ourselves.

Insert image description here

19.5.2 Script Interpretation

1. The script is as follows:

import "strings"import "regexp"
import "influxdata/influxdb/monitor"
import "influxdata/influxdb/schema"
import "influxdata/influxdb/secrets"
import "experimental"
import "http"
import "json"
option task = {
    
    name: "Notebook Task for local_8dc089398e53-41532537902f", every: 10m, offset: 0s} - f5df-447e-

option v = {
    
    timeRangeStart: -24h, timeRangeStop: now()}
check = {
    
    _check_id: "local_8dc08939-f5df-447e-8e53-41532537902f",
_check_name: "Notebook Generated Check", _type: "custom", tags:
{
    
    }}
notification = {
    
    _notification_rule_id: "local_8dc08939-f5df-447e-
8e53Rule", _notification_endpoint_id: "local_8dc08939-41532537902f", _notification_rule_name: "Notebook Generated -f5df-447e-8e53-
41532537902f", _notification_endpoint_name: "Notebook Generated
Endpoint"}
task_data = from(bucket: "example_alert") |> range(start:
v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "co")
|> filter(fn: (r) => r["_field"] == "value")

trigger = (r) => r["undefined"] > 0
messageFn = (r) => "${
    
    strings.title(v: r._type)} for
${
    
    r._source_measurement} triggered at ${
    
    time(v: r._source_timestamp)}!"

task_data
|> schema["fieldsAsCols"](https://pdf2md.morethan.io/)
|> set(key: "_notebook_link", value:
"http://host1:8086/orgs/d2377c7832daa87c/notebooks/0a0bc4b03a6ba0
00")
|> monitor["check"](data: check, messageFn: messageFn, crit:
trigger)
|> monitor["notify"](

data: notification,endpoint: http["endpoint"](url:)( (^)
mapFn: (r) => {
    
    
body = {
    
    r with _version: 1}
return {
    
    headers: {
    
    "Content-Type":
"application/json"}, data: json["encode"](v: body)}
},
),
)

Next, we will explain this code to you from front to back.

19.5.2.1 Guide package

1. We will skip the import code at the top and won’t explain it further.

19.5.2.2 option task

1. The option task is actually the setting of the scheduled task. This line of code actually indicates the name. The alarm script generated by the notebook for us is essentially an InfluxDB scheduled task.

Insert image description here

19.5.2.3 option v

1. The first line of code, option v, declares a record type variable, which contains two key-value pairs, which actually represent the beginning and end of the query time range. -24h is displayed here, actually because when we operate directly in the notebook, the time range in the upper right corner is set to -24h. We will change it to -15s later.

Insert image description here

19.5.2.4 check and notification variables

1. These two lines of code declare a record respectively, which is actually used as a parameter for the subsequent monitor function. Because fields such as _check_id, _check_name, and _type are required in the _moitoring bucket, the notebook is automatically generated. The time is automatically arranged for us.

Insert image description here

19.5.2.5 Query data

Insert image description here

1. The code in the picture above completes the query for the example_alert bucket. And the queried table stream is assigned to a variable named task_data.

19.5.2.6 Declaring a threshold function

1. Here, there is a function named trigger. It can be seen that its main logic is a predicate expression. The reason for declaring a function here is that there is a monitor function later that needs to pass in a predicate function. In addition, you can see that the logic of this function is used to determine whether the carbon monoxide concentration exceeds 0.04.

Insert image description here

19.5.2.7 Message Templates

1. This is also a function, but it directly returns a string, and the content inside is actually a message template.
Insert image description here

19.5.2.8 Alarm logic

1. The next large section is the logic of alarming.

Insert image description here

  • First, the schema["fieldAsCols"] function plays the role of converting the data structure. Results as shown below:
    Insert image description here
  • The set function adds a constant field to the table stream
  • monitor["check"] function plays the role of checking the status
    Insert image description here
    2. It should be noted that the check variables passed in by the data parameter are actually variables such as _check_id and _check_name. messageFn is a message template. crit is an alarm level in our previous CHECK, but here it becomes a formal parameter of the function. The function value passed in is trigger, which is the predicate function we mentioned before.

Insert image description here
3. Also note that although the crit parameter is passed by value in the script generated by notebook, the monitor["check"] function actually has other parameters that can be passed. As shown below:

Insert image description here
4. You can see that there are also info, ok, and warn parameters. So in fact, we can still manually modify the script to fill in these value ranges.

5. The monitor["notify"] function is used to send data to the outside. You can see that an http terminal is declared in it. Finally, it is important to note that there is a local variable called body.

Insert image description here
6. This is actually the request body when we send the POST request. r is the data in our table stream, so the crux of the failure to connect to Ruixiang Cloud is here. We only need to modify the body into a format that meets the requirements of Ruixiang Cloud API.

7. Why don’t we directly use the if else logic to complete the greater than or less check and then send the request directly to the outside, but use two specialized monitor functions to complete the function? Mainly because the monitor function will leave traces in our Alterhistory. That is, the monitor["check"] and monitor["notification"] functions will write check and notification records to the _monitoring
bucket, which is very important.

19.5.3 Modify the script to integrate Ruixiang Cloud

1. Finally, we modify the local variable body so that it meets the format required by Ruixiang Cloud API.

Insert image description here
2. The modified code is as shown in the picture above.

19.5.4 Create a scheduled task

1. Click the Tasks button on the left, and then click the CREATE TASK button on the upper right.

Insert image description here
2. Paste the script we just modified into the editing area, write the information in the option task line of code into the setting form on the left, and delete the original option task code.

Insert image description here
3. Finally, click the Save button in the upper right corner to create this scheduled task.

19.5.5 Test the alarm effect

1. Now, we upload a piece of data with a value greater than 0.04 to test the docking effect.

Insert image description here
2. After waiting for a while, you can see that we received a phone call, which told us that the carbon monoxide concentration value exceeded 0.04.

Insert image description here

19.6 Example: Improving an Alarm System

19.6.1 Current Alarm Architecture

1. You can regard Ruixiang Cloud as a highly available alarm service, that is, Ruixiang Cloud can be accessed without failure 24 hours a day no matter what. Then combined with Ruixiang Cloud, our InfluxDB can set up a scheduled task to check the rationality of the data, and extract the latest data every once in a while to calculate it. If the data is inappropriate, an alarm signal will be sent to Ruixiang Cloud, and then Ruixiang Cloud will initiate a notification to our specific technical personnel.

Insert image description here

19.6.2 A more trustworthy architecture

1. There is a problem with the architecture in the previous section. If one night passes, my InfluxDB will crash abnormally. If InfluxDB is down, it will naturally not send alarm information to Ruixiang Cloud, so the night has passed and you have had a good sleep, but is it really a Christmas Eve?

2. Therefore, it would be great if Ruixiang Cloud could know whether InfluxDB is still alive. It would be best if Ruixiang Cloud could check whether my InfluxDB is still there and can be used every once in a while. This behavior is called business availability check. In this picture, the orange arrow is Ruixiang Cloud’s inspection of InfluxDB.

Insert image description here

19.6.3 Architecture in the following example

1. Using Ruixiang Cloud to alarm is actually purchasing software services from Ruixiang Cloud. This software is on Ruixiang Cloud's server, not your own company's server. This approach is called SaaS, software as a service. In this case, if we want Ruixiang Cloud to access our InfluxDB service in turn, we must expose the InfluxDB service to the public network. At this time, InfluxDB is either on the public network itself, or it uses the intranet to penetrate. Because the demonstration environment here is an intranet, it is necessary to build an intranet tradition. The final overall structure is as follows.

Insert image description here
2. In this way, no matter whether the internal network penetration collapses or the InfluxDB collapses, the alarm will be triggered.

19.6.4 Set up intranet penetration

1. This tutorial uses the intranet penetration tool provided by Peanut Shell to achieve intranet penetration. A new Peanut Shell account has free intranet penetration quota and free 1 M bandwidth.

19.6.4.1 Install the peanut shell intranet penetration client

1. Visit the official website download page: https://hsk.oray.com/download

2. Pay attention to choosing the installation package that matches your system. We are demonstrating the use of CentOS, so here I choose CentOS Linux (x86_64)

Insert image description here
3. Use the following command to install the deb package.

sudo rpm -ivh ./phddns_5.2.0_amd64.rpm

4. After the installation is completed, a service named Phtunnel will be automatically started, and you will have a command line tool called phddns that can control this service. All relevant information is displayed in the prompt message printed out after installation.

Insert image description here

19.6.4.2 Activate SN

1. Under normal circumstances, phddns will run automatically after installation. You can use the phddns status command to view the running status of the program.

phddns status

2. If ONLINE is displayed, it is operating normally. Note the SN code here, which is our device identification code. In addition, it shows here that we have a remote management address, which is http://b.oray.com. Visit this address in your browser. You will enter a login page, as shown below:

Insert image description here
3. Now, switch to SN login. You can see that we need to enter our device SN code here. We were also prompted during the installation just now that the initial password is admin. Now enter the SN code and password and click to log in.

Insert image description here

4. Here you need to register a Berry account and activate it.

Insert image description here

19.6.4.3 Configuring intranet penetration

1. After successful activation, when you see the management page, click the intranet penetration button on the left toolbar to enter the intranet penetration management panel and click Add Mapping.

Insert image description here
2. Follow the sequence in the figure. Note that the intranet host refers to the host where you just installed intranet penetration. The trial version can only have a maximum bandwidth of 1 Mbps. The converted uplink and downlink speed should be 128kb/s. After clicking OK, you will return to the management page of intranet penetration.

Insert image description here

3. If you can see the card shown in the picture below, it means that the intranet penetration has been configured successfully.

Insert image description here
4. From now on, when we visit https://1674b87n99.oicp.vip/, it is equivalent to accessing local 1 27.0.0.1:8086

19.6.5 Configuring business availability detection

19.6.5.1 Create monitoring tasks

1. First, return to the homepage of Ruixiang Cloud and click on the business availability monitoring platform marked in the picture on the left.

Insert image description here
2. After coming to the homepage of the monitoring task, click the green button marked in the picture (Create Monitoring)

Insert image description here

3. First complete the monitoring settings. Here we need to set the monitoring address to the address we just configured for intranet penetration. The address uses /health, and a Get request is issued to this address. Under normal circumstances, a json format data will be returned, which will tell us whether InfluxDB is currently healthy. Additionally, the status code should be 200 if the request is successful. Finally, making a get request to this interface does not require token blessing.

Insert image description here

4. The response part settings are as shown in the figure below. Explain, the meaning here is that if the interface completes the response within 2 seconds, the speed is satisfactory. If it is between 2 and 5 seconds, it is relatively slow. If it is more than 5 seconds, it means it is very slow.

Insert image description here

5. Click on the result verification in the upper right corner and set the response code to 200, which means that the response code of 200 is the normal state we expect.

Insert image description here
6. The monitoring frequency is set to 15 minutes. In fact, the free version can only be accessed once every 15 minutes at the fastest. You can get a higher access frequency after recharging.

Insert image description here

7. Operator and monitoring area refers to which province and which operator's network you want to use to access your interface, because sometimes an interface may be accessible to China Mobile's network, but not accessible to China Unicom's network. In the end we only choose one host. As shown below.

Insert image description here

8. After the above operations are completed, click the Save button in the upper right corner.

Insert image description here

19.6.5.2 Configure alarm rules

1. After returning to the monitoring list, you can see that there is already a monitoring item on the page.
Insert image description here

2. Now click the alarm button on the left and let’s configure the alarm channel.

Insert image description here

3. As you can see, we are faced with three concepts: alarm rules, alarm strategies, and alarm behaviors. Like our InfluxDB, the alarm rules here correspond to inspection tasks, which translate data into three signals: warning, serious, and normal. The alarm behavior is equivalent to an alarm terminal. You can choose to send an email or make a phone call. The alarm policy is used to connect alarm rules and alarm behaviors. It is equivalent to the alarm rules in InfluxDB.

Insert image description here

4. First, let’s configure the alarm rules. First, click the alarm rules button on the left, and then click the + sign.

Insert image description here

5. Name the rule and select API monitoring on the rule type.

Insert image description here

6. Click the alarm object in the upper tab. You can see that the created API monitoring type monitoring task is on the left. Select the InfluxDB health status on the left, and then click the >> button in the middle to add it to the selected list. Come here and click Next.

Insert image description here
7. As you can see, serious conditions need to be set here, which is equivalent to setting the CRIT threshold in InfluxDB. Here we set it so that if the availability rate in the past 15 minutes is not 100%, then a serious error has occurred. As shown in the picture, click Next again in the lower right corner.

Insert image description here

8. This step is called a warning condition, which is equivalent to warn in InfluxDB. Here we directly click the blue button to copy the same conditions from the severe condition, and then click Save in the lower right corner.

Insert image description here

9. Finally, if everything is normal, there will be one more alarm rule list, as shown in the figure below.

Insert image description here

19.6.5.3 Configuring alarm behavior

1. Click the alarm behavior button on the left, and then click the + sign to create a new alarm behavior.

Insert image description here

2. First select the behavior type in the pop-up window. Here we choose Webhook. You can see that a URL is required here, which is also equivalent to us sending a request outbound. Therefore, whoever this URL specifies should actually be connected to Ruixiang Cloud for processing.

Insert image description here

3. Return to the integration page of the intelligent alarm platform and find Ruixiang Cloud in the integration tool. Click to create.

Insert image description here

4. Name it first, then click Save directly below and get the application key.

Insert image description here

5. You can see that the url on the configuration description is automatically completed with a random string, which is the key. Now copy it.

Insert image description here

6. Return to the window for creating an alarm behavior, fill in the URL, and click Test. If the test result shows connect success, it means that our configuration is correct. Click Save in the lower right corner.

Insert image description here

19.6.5.4 Modify the dispatch policy

1. Now we have created two applications in our alarm platform. But we have created a dispatch strategy before, which will forward all alarm notifications received by the REST API to a certain user. But the Webhook we just created is not yet included in this dispatch strategy. So here, find the dispatch policy we set before and click the edit button on the right.

Insert image description here

2. After entering the editing page, click the Add button indicated in the picture.

Insert image description here

3. As you can see, we will display two applications here, that is, the alarm notifications received by these two interfaces will be forwarded to the users we specify. Finally, click Save, and the dispatch policy is modified successfully.

Insert image description here

19.6.5.5 Create alarm policy

1. Return to the monitoring platform and alarm management page. First click the alarm policy button on the left, and then click the + sign to create an alarm policy.

Insert image description here

2. The configuration of the triggering strategy is as shown in the figure below.

Insert image description here

3. Click the trigger behavior setting button above, and then click the Add button on the right.

Insert image description here

4. You can see that there is a tab that can be selected. This is the alarm behavior we created before. Click the mouse to select it, and then click the select button in the lower right corner.

Insert image description here

5. If the alarm behavior is successfully added, click the save button in the lower right corner. At this point, our alarm strategy has been successfully established.

Insert image description here

Guess you like

Origin blog.csdn.net/qq_38263083/article/details/131938475