The huge pit of consul service registration and service discovery

Recently used consulas a fundamental feature of the project's service registration and service discovery. I encountered some pits in the use of tower construction clusters, and recorded them one by one below.

Consul cluster multi-node

The node of the consul cluster is what we call the consul instance. The cluster consists of multiple nodes. For the availability of the cluster, more than half of the nodes need to enable the server. For example, at least 3 of the 5 nodes enable server mode, and a cluster composed of 3 nodes enables 2 nodes to enable server mode. When you see this, you must think there is no problem, but there are many consul pits. Join your cluster consisting of the following:

Node          Address              Status  Type    Build  Protocol  DC                    Segment
BJ-MQTEST-01  10.163.145.117:8301  alive   server  1.0.6  2         iget-topology-aliyun  <all>
BJ-MQTEST-02  10.163.147.47:8301   alive   server  1.0.6  2         iget-topology-aliyun  <all>
BJ-TGO-01     10.163.145.110:8301     alive   client  1.0.6  2         iget-topology-aliyun  <default>

Then the client can use the above 3 IPs to connect to the consul cluster. Suppose client A uses 10.163.145.117 to register the service, and after restarting, uses the address 10.163.145.110 to register the previous service information. At this time, you will be pleasantly surprised to find that the UI can At the same time, it is seen that there are two identical serviceids under the same servicename.

This is the pit of multiple nodes in the consul cluster. Although the bottom layer of the service uses KV storage, the KEY of the service has nothing to do with the serviceid, so it can be repeated in the cluster.

Solution one

Only one node in the cluster uses server mode, and the others are in client mode. The disadvantage is obvious. If the node of the server hangs, the availability of the cluster will be lost.

Solution two

The same client uses the same node address, which ensures that no two identical serviceids exist under the same servicename. The disadvantage is that if the node bound to the client hangs, the client cannot use it. code gives

package registry

import (
	"fmt"
	"math"
	"net"
	"sort"
	"strings"

	log "github.com/golang/glog"
)

type ConsulBind struct {
	Addr  string
	IpInt float64
}
type ConsulBindList []ConsulBind

func (s ConsulBindList) Len() int {
	return len(s)
}
func (s ConsulBindList) Swap(i, j int) {
	s[i], s[j] = s[j], s[i]
}
func (s ConsulBindList) Less(i, j int) bool {
	return s[i].IpInt < s[j].IpInt
}
func (s ConsulBindList) ToStrings() []string {
	ret := make([]string, 0, len(s))
	for _, cbl := range s {
		ret = append(ret, cbl.Addr)
	}
	return ret
}

func BingConsulSort(consulAddrs []string) []string {
	localIpStr, err := GetAgentLocalIP()
	if err != nil {
		return consulAddrs
	}
	localIp := net.ParseIP(localIpStr)
	localIpInt := int64(0)
	if localIp != nil {
		localIpInt = util.InetAton(localIp)
	}
	addrslist := make([]ConsulBind, 0, len(consulAddrs))
	for _, addr := range consulAddrs {
		ads := strings.Split(addr, ":")
		if len(ads) == 2 {
			ip := net.ParseIP(ads[0])
			if ip != nil {
				ipInt := util.InetAton(ip)
				fmt.Println("ip:", ip, ipInt, localIpInt, (ipInt - localIpInt))
				addrslist = append(addrslist, ConsulBind{
					Addr:  addr,
					IpInt: math.Abs(float64(ipInt - localIpInt)),
				})
			}
		}
	}
	consulBindList := ConsulBindList(addrslist)
	sort.Sort(consulBindList)
	log.Infof("sort addrs %v", consulBindList)
	return consulBindList.ToStrings()
}

Solution three

The client randomly uses any address in the cluster, but before registering, it first determines whether the servicename already has the serviceid to be registered, and if so, deletes and re-registers. The disadvantage is that there will be more events in the watch, and it can be upgraded to not allow repeated registration if it exists and is healthy. This is what I use.

delete service

At the beginning, many people will think that there is a problem with the service and it will be removed from the shelf. But deleting a service in consul is not that simple! Please check the official website document:
catalog document
Deregister Entity
agent/service document
Deregister Service

It seems that you can delete the service correctly by choosing one of them! You can continue to say that it is not that simple, there are many pits in consul.

After selecting the /agent/service/deregister/:service_idinterface, you will find that you cannot delete the service of other nodes. For example, there is a serviceid in 10.163.145.117 agent_xxxx_v1, but the IP used by the client to connect to consul is 10.163.145.110, so it cannot be deleted agent_xxxx_v1.

Is it okay that there is still an interface that is not used? Let's take a look again /catalog/deregister. After the execution is completed, I look at the UI. Well, it is indeed deleted agent_xxxx_v1. etc. . . . . . After 30s, it was found that agent_xxxx_v1it appeared again, what happened? ? ? ?

Please see bug Unable to deregister a service #1188 in consul .

solution

Step 1: Query all lists of servicenames to which serviceid belongs;
Step 2: Delete all serviceids after traversing the list to obtain the address of the node;

if len(c.Options.Addrs) > 0 {
		addrMap := make(map[string]string, len(c.Options.Addrs))
		for _, host := range c.Options.Addrs {
			addr, _, err := net.SplitHostPort(host)
			if err != nil {
				log.Warningf("%v is err=%v", host, err)
				continue
			}
			addrMap[addr] = host
		}
		rsp, _, _ := c.Client.Health().Service(s.Name, "", false, nil)
		for _, srsp := range rsp {
			if srsp.Service.ID == serviceId {

				if host, ok := addrMap[srsp.Node.Address]; ok {
					config := consul.DefaultNonPooledConfig()
					config.Address = host
					// 创建consul连接
					client, err := consul.NewClient(config)
					if err != nil {
						log.Warningf("NewClient is err=%v", host, err)
					}
					err = client.Agent().ServiceDeregister(serviceId)
					log.Infof("ServiceDeregister host=%v , serviceId=%v", host, serviceId)
				}
			}
		}
	} else {
		err = c.Client.Agent().ServiceDeregister(serviceId)
		log.Infof("ServiceDeregister  serviceId=%v", serviceId)
	}

It is certain that there are other pits in consul, but these two pits make me remember them deeply. I recorded them as a reminder for students who are planning to use consul or who have encountered these pits.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324387567&siteId=291194637