Server-side high-availability solution

Preface

High availability mainly solves the following problems:

  • It is not a single node. Any service node can automatically use other service nodes when it is down.
  • Allow new service nodes to enter the service group, and the client does not perceive it.

Different service agreements have different solutions.

grpc

  • Recommended index: five stars
  • Recommended reason: The current mainstream etcd+grpc architecture has mature packages, a large number of actual cases, and enough people to ask.

The grpc protocol, the mainstream uses etcd-based service discovery for high availability. His business scenario is that the intranet of the service group calls each other. When each service node is released, the host+port of the service will be registered in etcd, and the key value is the prefix of a certain service group name, such as /user/node/1, /user/node/2. This step of registration must maintain a lease every 5 seconds.

When a certain service needs to be called, it will get the available list from etcd in the form of prefix matching, and then select the available services in a balanced manner.

etcd officially provides high-availability best practices for grpc protocol, here is the key code:

Caller

import (
	"go.etcd.io/etcd/clientv3"
	etcdnaming "go.etcd.io/etcd/clientv3/naming"

	"google.golang.org/grpc"
)

...

cli, cerr := clientv3.NewFromURL("http://localhost:2379")
r := &etcdnaming.GRPCResolver{
    
    Client: cli}
b := grpc.RoundRobin(r)
conn, gerr := grpc.Dial("my-service", grpc.WithBalancer(b), grpc.WithBlock(), ...)

Service party for renewal

go etcd.Register("x.x.x.x:port", "app_key", "y.y.y.y:port", 5)
package etcd

import (
	"context"
	"encoding/json"
	"go.uber.org/zap"
	"log"
	"strings"
	"time"

	"fmt"
	"go.etcd.io/etcd/client/v3"
)

var cli *clientv3.Client

// Register register service with name as prefix to etcd, multi etcd addr should use ; to split
func Register(etcdAddr, name string, addr string, ttl int64) error {
    
    
	var err error

	if cli == nil {
    
    
		cli, err = clientv3.New(clientv3.Config{
    
    
			Endpoints:   strings.Split(etcdAddr, ";"),
			DialTimeout: 15 * time.Second,
			LogConfig: &zap.Config{
    
    
				Level:       zap.NewAtomicLevelAt(zap.ErrorLevel),
				Development: false,
				Sampling: &zap.SamplingConfig{
    
    
					Initial:    100,
					Thereafter: 100,
				},
				Encoding:      "json",
				EncoderConfig: zap.NewProductionEncoderConfig(),
				// Use "/dev/null" to discard all
				OutputPaths:      []string{
    
    "stderr"},
				ErrorOutputPaths: []string{
    
    "stderr"},
			},
		})
		if err != nil {
    
    
			return err
		}
	}

	service := Service{
    
    
		Addr: addr,
	}
	bts, err := json.Marshal(service)
	if err != nil {
    
    
		return err
	}

	serviceValue := string(bts)
	serviceKey := fmt.Sprintf("%s/%s", name, serviceValue)

	ticker := time.NewTicker(time.Second * time.Duration(ttl))

	go func() {
    
    
		for {
    
    
			getResp, err := cli.Get(context.Background(), serviceKey)
			if err != nil {
    
    
				log.Println(err)
			} else if getResp.Count == 0 {
    
    
				err = withAlive(serviceKey, serviceValue, ttl)
				if err != nil {
    
    
					log.Println(err)
				}
			} else {
    
    
				// do nothing
			}

			<-ticker.C
		}
	}()

	return nil
}

type Service struct {
    
    
	Addr string `json:"Addr"`
}

func withAlive(serviceKey string, serviceValue string, ttl int64) error {
    
    
	leaseResp, err := cli.Grant(context.Background(), ttl)
	if err != nil {
    
    
		return err
	}

	fmt.Printf("key:%v\n", serviceKey)
	_, err = cli.Put(context.Background(), serviceKey, serviceValue, clientv3.WithLease(leaseResp.ID))
	if err != nil {
    
    
		return err
	}

	ch, err := cli.KeepAlive(context.Background(), leaseResp.ID)
	if err != nil {
    
    
		log.Println(err)
		return err
	}

	// ch管道的值需要持续取出释放,否则会占用通道导致切片饱和
	go func() {
    
    
		for {
    
    
			_ = <-ch
		}
	}()

	return nil
}

// UnRegister remove service from etcd
func UnRegister(serviceKey string) {
    
    
	if cli != nil {
    
    
		cli.Delete(context.Background(), serviceKey)
	}
}

http

There are many high-availability solutions for http, and there are roughly the following three mainstream:

  • nginx pre-routing, configure available service nodes through upstream
  • Tencent Cloud backend supports domain name <load balance> to multiple ip, and through <health check> to achieve the same effect as nginx.
  • Manually implement http-based robin equalizer and access etcd

The first

  • Recommended index: 3 stars
  • Reason: Need to manually go to the server to maintain the increase or decrease of nodes, which is not very simple. High availability relies on upstream, lossless migration and restart, requiring manual participation, which is relatively clumsy.
  • Before the service, there is an nginx cluster, and each nginx has the following configuration examples:
upstream srv_name_http {
   server y.y.y.y:8112 weight=7;
   server x.x.x.x:8112 weight=3;
   server z.z.z.z:8112 weight=10;
}
server {
    listen      80;
    server_name your.addr.com;
    error_log /data/log/nginx/your.addr.com.log;

    # request header
    proxy_read_timeout 3200;
    proxy_send_timeout 3200;
    proxy_set_header   Host             $http_host;
    proxy_set_header   Cookie           $http_cookie;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-Proto    $scheme;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

    location / {
        proxy_pass http://srv_name_http;
    }

    error_page 404 /404.html;
        location = /40x.html {
    }
    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }
}

It is worth noting that the maintenance of nginx should also be decoupled from the server, that is, there is no need to manually log in to the server to configure nginx, but the entire configuration file (usually generated automatically by the code, error-proof), embedded in the project directory, During deployment, upload to the corresponding location and execute nginx -t nginx -s reload. Reduce maintenance costs.

The second

  • Recommended index: five stars
  • Reason: More mainstream solutions, there are many groups of users, can be lossless, and do not need to manually maintain nginx
  • Server vendors generally provide load balancing and health checks for domain names . Since the http domain name has fixed port 80 to the server, each service node needs to use nginx as proxy_pass. However, the difference from the first one is that 1. nginx and the service node are bound to the same server, and there is no need to pre-nginx cluster. 2. nginx only has the service node server, and does not need to have upstream.
    Insert picture description here

The third type,

  • Implement an http service discovery based on etcd.
  • Recommended index: 4 stars
  • Reason: The implementation may seem simple, but it is necessary to understand the principle of etcd and the principle of lossless migration to write a good service discovery component. If it has been done, its effect is equivalent to that of etcd+grpc. The children's shoes that have been used are said to be easy to use, but the implementation is not intended to be open source, haha.

The general implementation principle is:

  • Every time you get it with a prefix, please pull all the values ​​and store them in the memory of the service.
  • Monitor the change of the prefix key, and once a key is renewed, the value in the memory is removed.
  • The caller only looks for available URLs from the queue in memory, instead of directly taking them from etcd.

But to be truly lossless, there are two directions that can be implemented:
First, the implemented components must implement a roundrobin training mechanism. When a request fails, it will continue to request the next URL until it succeeds or a threshold. In this way, you are not afraid of the window period of the heartbeat lease.

Second, a route can be added to the node. After the node is manually offline, wait for the consumption backlog and close the node. The difficulty of this implementation is this route, and curl should be written in the command localhost:xxxx/offline-from-etcd/. Finding this xxxx port is the most difficult part.

It is not hidden here. The way to find xxxx is to save a copy in the environment variable through (hostname: node name), and then the xxxx of curl will be replaced by this environment variable.

Haha, isn't it simple? The way to get the hostname isos.GetHostname()

tcp、websocket

Using nginx as a client-hash balance strategy, websokect can be realized. TCP has not been tested.

Add a service node, focusing on the following functions:

  • Get the fastest available tcp ip that the client can access
  • Get the ip and port of a certain service module and return to inform the client

The client is directly connected to tcp. The architecture diagram is as follows:

Insert picture description here

The high availability of tcp is only reflected in the established connection when the connection is established, and it will definitely bounce when the tcp service hangs. It is recommended that the client reconnect automatically. So in design, tcp is not as complicated as http.

Guess you like

Origin blog.csdn.net/fwhezfwhez/article/details/110918799