Consul of: Health Monitoring Service

Registration service - service process to register their location in the registry. It usually register their host and port number, and sometimes the authentication details, protocol, version number, and operating environment.

Service discovery - the client application process sends a query to the registry to obtain location services. An important role is to provide service discovery list of services available.

Service definition format similar to the following:

Copy the code
Copy the code
{  
  "service":{  
    "id": "jetty",  
    "name": "jetty",  
    "address": "192.168.1.200",  
    "port": 8080,  
    "tags": ["dev"],  
    "checks": [  
        {  
            "http": "http://192.168.1.200:8080/health",  
            "interval": "5s"  
        }  
    ]  
  }  
}  
Copy the code
Copy the code

Which, check is used for health screening services, you can have more, or may not, examination of a variety of ways. check is defined in the configuration file, the interface by adding or HTTP runtime. Check is consistent with the node via HTTP.

There are five check method:

or script must check the TTL type, if the type of script, the script must be provided, and the variable interval, if a TTL type, then the variable must be provided ttl

script is consul initiative to check service health, ttl service voluntarily report their health to the consul.

Here are a few ways to configure

Check must be a Script, HTTP, TCP, TTL four types. Script Script script and the need to provide the type of interval variables. HTTP type must provide http and Interval field. The need to provide the type of TCP tcp and Interval field, TTL type order to provide ttl. Check the name field by automatically service:<service-id>generating, if there are a plurality of-Service, by service:<service-id>:<num>generation.

Script check(Script+ Interval)

By performing health check external applications: This external program having exit code, and may produce some output; means according to the script to call a preset time interval (for example, call every 30 seconds), similar to the plug-in system Nagios script output limit less than 4K, 4K output is greater than the cutoff. By default, the script timeout time of 30 seconds - can be configured by timeout.

Copy the code
Copy the code
{  
  "check": {  
    "id": "mem-util",  
    "name": "Memory utilization",  
    "script": "/usr/local/bin/check_mem.py",  
    "interval": "10s",  
    "timeout": "1s"  
  }  
}  
Copy the code
Copy the code

HTTP check(HTTP+ Interval)

This inspection will follow a preset time interval to create a HTTP "get" requests. HTTP response code to indicate the state in which the service: any 2xx code as normal, 429 a warning - there are many requests; other value indicates failure.

This type of inspection is used curl, or an external program to handle HTTP operations. By default, HTTP Checks, a request time interval equal to the call request, a maximum of 10 seconds. Possible to use a custom HTTP check, can be freely configured timeout period, the output is limited to less than 4K, 4K output is greater than the cutoff.

Copy the code
Copy the code
{  
  "check": {  
    "id": "api",  
    "name": "HTTP API on port 5000",  
    "http": "http://localhost:5000/health",  
    "interval": "10s",  
    "timeout": "1s"  
  }  
}  
Copy the code
Copy the code

TCP check(TCP + Interval)

The interval specified by a preset time IP / Hostname and port to create a TCP connection. Status of the service depends on the TCP connection is successful - If the connection is successful, the status is "success"; otherwise the state is "critical". If a Hostname resolved to an IPv4 and an IPv6, it will try to connect these two addresses, the first time the connection is successful, the service status is "success".

If you want to perform a health check using an external script this way, the script should use "netcat" or a simple socket operation.

By default, TCP checks, a request time interval equal to the call request, a maximum of 10 seconds. Also it can be freely configured.

Copy the code
Copy the code
{  
  "check": {  
    "id": "ssh",  
    "name": "SSH TCP on port 22",  
    "tcp": "localhost:22",  
    "interval": "10s",  
    "timeout": "1s"  
  }  
}  
Copy the code
Copy the code

TTL check: (Timeto Live lifetime)

This checks for a given TTL retain the last state, checks the status must be periodically updated via HTTP interface, the external interface is not updated if the state, then the state will be recognized as abnormal.

        This mechanism, similar in concept to "dead man switch " requires periodic service health report. For example, a healthy APP may periodically be put to HTTP status end; if the app a problem, then the TTL expires, the health check will enter the Critical state. Check for updates for a given health information endpoint is pass endpoint and fail endpoint. (See agent http endpoint)

        TTL checks will simultaneously update to its last known state disk, which allows the Agent to recover to a known state by after the restart. To maintain the effectiveness of health status through a check on the TTL side.

Copy the code
Copy the code
{  
  "check": {  
    "id": "web-app",  
    "name": "Web App Status",  
    "notes": "Web app does a curl internally every 10 seconds",  
    "ttl": "30s"  
  }  
}  
Copy the code
Copy the code

 Docker+ interval

This check is dependent on the calling external program package docker container. Docker run external applications triggered by docker Exec API.

        We expect, consul Agent user access Docker HTTP API or UNIX socket. Consul use $ DOCKER_HOST to determine Docker API endpoint. The application will run and perform health check services that run in a container, and return the appropriate exit code. Check the specified time interval called.

        If you have multiple shell on the same host host, you also need to configure the shell parameters.

Within the output limit 4K, 4K output is greater than the cutoff.

Copy the code
Copy the code
{
"check": {
    "id": "mem-util",
    "name": "Memoryutilization",
    "docker_container_id": "f972c95ebf0e",
    "shell": "/bin/bash",
    "script": "/usr/local/bin/check_mem.py",
    "interval": "10s"
  }
}
Copy the code
Copy the code

summary

        Each check must contain the name, id and notes two are optional. If you do not provide the id, the id is set to the name. In a node, check the ID must be unique. If the name is a conflict, then the ID should be set.

        Notes field is mainly to enhance the readability of checks. Script check in, notes field may be generated by the script. Similarly, an HTTP interface applies an external program update TTL check may be provided as the notes field.

Check script

        Check scripts are free to do anything to determine the status of the check. The only restriction is: exit code must comply with the following convention:

  1. Exit code of 0 - Normal
  2. Exit Code 1 - Alarm
  3. Other values ​​- failed.

        Consul rely on this convention. The output of other scripts are stored in the notes field, it can for people to see. 

 Health status initialization

        By default, when the checks registered to the Consul agent, health status is immediately set to "Critical". You can prevent the service is registered directly through ( "passing") state, before entering the service pool considered healthy state. In some cases, checks may need to specify the initial state of health checks, can be achieved by setting the "status" field.

as follows:

Copy the code
Copy the code
{
  "check": {
    "id": "mem",
    "script": "/bin/check_mem",
    "interval": "10s",
    "status": "passing"
  }
}
Copy the code
Copy the code

 

The initial state is set to passing. 

Service-boundchecks

        Health Check (Health checks) or possibly bound to the specified service. This will ensure that the health check status only affects a given service rather than the entire node. Service binding health checks need to provide a service_id field.

Copy the code
Copy the code
{
  "check": {
    "id": "web-app",
    "name": "WebApp Status",
    "service_id": "web-app",
    "ttl": "30s"
  }
}
Copy the code
Copy the code

 

        In the above example, if the web-app health check fails, it will only affect the effectiveness of web-app services, other services, this node is not affected.

MultipleCheck Definitions

Defines a plurality of check, can use the field "checks", examples:

Copy the code
Copy the code
{
  "checks": [
    {
      "id": "chk1",
      "name": "mem",
      "script": "/bin/check_mem",
      "interval": "5s"
    },
    {
      "id": "chk2",
      "name": "/health",
      "http": "http://localhost:5000/health",
      "interval": "15s"
    },
    {
      "id": "chk3",
      "name": "cpu",
      "script": "/bin/check_cpu",
      "interval": "10s"
    },
    ...
  ]
}
Copy the code
Copy the code

Note, practice found that the script does not support python, you must be a shell script

Guess you like

Origin www.cnblogs.com/ExMan/p/11884747.html