Analysis and realization of the main selection scene of the distributed system;

One: The scene that needs to choose the master

1: The service has multiple machines, and one of them is used to perform the task. The execution of multiple machines at the same time will cause problems. For example, if the records in the database with the status of failure are taken out and executed again, if multiple machines execute at the same time, it will cause a failed task to be executed by multiple machines at the same time.

2: There are multiple machines in the service, choose one of them as the master, the master is responsible for the distribution of tasks, and everyone consumes and processes tasks together. Or take out the failed records in the database and re-execute them. Since one machine may not be able to handle it, multiple machines are required to process it together. At this time, the main machine is responsible for finding the failed records from the database and writing them into the message queue. Other machines consume the tasks in the queue together and process the failed records.

 

Two: choose the master

According to the master selection scenario above, we can actually pick one randomly from multiple machines, which is much simpler than the master selection algorithm of raft. We can even specify a machine in the configuration file, and only this machine performs related functions, and other machines do not. If it is a fixed number of machines, and one machine can fulfill our needs, this is actually fine. If the machine is not fixed and cannot be processed by a single machine, the configuration file method is not suitable.

You can use competition to choose the master, and whoever gets the first is the master.

 

1: Option One

Use redis scheme to realize. If the specified key does not exist, write the machine information to this key. The machine that successfully writes is the master. Set the expiration time to prevent the machine from hanging abnormally. All machines need to regularly grab the redis lock. The SETNX command satisfies our needs. The one who succeeds in writing redis is the master, and the one who fails to write is the slave.

advantage:

  • 1: Simple to implement, a little better than the configuration file, and support machine dynamics

Disadvantages:

  • 1: Need to grab the lock regularly
  • 2: The master may change frequently, and it is necessary to ensure the correctness of the business logic of the master during the switching process
  • 3: Some time slices may not have a master, that is, the master hangs up, and other machines have not reached the time to grab the lock, this time slice does not have a master

 

2: Option Two

Implemented with etcd scheme. Etcd supports transactions that can be written as they do not exist, achieving the same effect as redis SETNX, and through etcd’s leasing mechanism to ensure that all machines are notified when the master hangs, then everyone automatically starts a new round of master election, or The first one caught in that sentence is the Lord.

advantage:

  • Meet our needs without design flaws
  • Only when the master hangs up, will the master be re-elected, and there is no need to worry about the influence of the master on the business logic during the switching process

Disadvantages:

  • It's relatively complicated to implement, so let me try 
  •  

The golang source code is implemented as follows:

  1 package etcdDemo
  2 
  3 import (
  4     "context"
  5     "fmt"
  6     "github.com/coreos/etcd/clientv3"
  7     "github.com/google/uuid"
  8     "time"
  9 )
 10 
 11 type Callback func(isMaster bool)
 12 
 13 type SelectMaster struct {
 14     endPoints []string
 15     key       string
 16     cli       *clientv3.Client
 17     lease     *clientv3.LeaseGrantResponse
 18     chClose   chan int
 19     callback  Callback
 20     token     string
 21     isMaster  bool
 22 }
 23 
 24 func NewSelectMaster(endPoints []string, key string) (*SelectMaster, error) {
 25     sm := &SelectMaster{
 26         endPoints: endPoints,
 27         key:       key,
 28         chClose:   make(chan int, 0),
 29         token:     uuid.New().String(),
 30     }
 31 
 32     cli, err := clientv3.New(clientv3.Config{
 33         Endpoints:   endPoints,
 34         DialTimeout: 3 * time.Second,
 35     })
 36     if err != nil {
 37         return sm, err
 38     }
 39     sm.cli = cli
 40     go sm.ioLoop()
 41     return sm, nil
 42 }
 43 
 44 func (sm *SelectMaster) ioLoop() {
 45     fmt.Println("SelectMaster.ioLoop start")
 46     ticker := time.NewTicker(time.Second * 3)
 47     defer ticker.Stop()
 48     chWatch := sm.cli.Watch(context.TODO(), sm.key)
 49     for {
 50         select {
 51         case <-ticker.C:
 52             if sm.lease == nil {
 53                 leaseResp, err := sm.cli.Grant(context.Background(), 4)
 54                 if err != nil {
 55                     fmt.Println("cli.Grant error=", err.Error())
 56                 } else {
 57                     sm.lease = leaseResp
 58                 }
 59             }
 60             if sm.lease != nil {
 61                 _, err := sm.cli.KeepAliveOnce(context.Background(), sm.lease.ID)
 62                 if err != nil {
 63                     fmt.Println("cli.KeepAliveOnce error=", err.Error())
 64                     break
 65                 }
 66             }
 67         case c := <-chWatch:
 68             for _, e := range c.Events {
 69                 if e == nil || e.Kv == nil {
 70                     continue
 71                 }
 72                 token := string(e.Kv.Value)
 73                 sm.isMaster = sm.token == token
 74                 if sm.callback == nil {
 75                     fmt.Println("SelectMaster.callback is nil")
 76                 } else {
 77                     sm.callback(sm.isMaster)
 78                     fmt.Println("SelectMaster.isLoop token=", token)
 79                     if token == "" { //主挂了,开始竞选
 80                         sm.election()
 81                     }
 82                 }
 83             }
 84         case <-sm.chClose:
 85             goto stop
 86         }
 87     }
 88 stop:
 89     fmt.Println("SelectMaster.ioLoop end")
 90 }
 91 
 92 func (sm *SelectMaster) IsMaster() bool {
 93     return sm.isMaster
 94 }
 95 
 96 func (sm *SelectMaster) Close() {
 97     sm.chClose <- 1
 98 }
 99 
100 func (sm *SelectMaster) Election(callback Callback) (bool, error) {
101     sm.callback = callback
102     return sm.election()
103 }
104 
105 func (sm *SelectMaster) election() (bool, error) {
106     ctx, cancel := context.WithTimeout(context.Background(), time.Second*3)
107     defer cancel()
108     leaseResp, err := sm.cli.Grant(ctx, 10)
109     if err != nil {
110         return false, err
111     }
112     sm.lease = leaseResp
113     txn := clientv3.NewKV(sm.cli).Txn(context.TODO())
114     txn.If(clientv3.Compare(clientv3.CreateRevision(sm.key), "=", 0)).
115         Then(clientv3.OpPut(sm.key, sm.token, clientv3.WithLease(leaseResp.ID))).Else()
116     txnResp, err := txn.Commit()
117     if err != nil {
118         return false, err
119     }
120     return txnResp.Succeeded, nil
121 }
122 
123 func testSelectMaster() *SelectMaster {
124     endPoints := []string{"172.25.20.248:2379"}
125     sm, err := NewSelectMaster(endPoints, "/test/lock")
126     if err != nil {
127         fmt.Println(err.Error())
128         return nil
129     }
130     callback := func(isMaster bool) {
131         fmt.Println(sm.token, "callback=", isMaster)
132     }
133     isSuccess, err := sm.Election(callback)
134     if err != nil {
135         fmt.Println(sm.token, "Election=", err.Error())
136     } else {
137         fmt.Println(sm.token, "Election=", isSuccess)
138     }
139     return sm
140 }
141 
142 func TestSelectMaster() {
143     var master *SelectMaster
144     for i := 0; i < 3; i++ {
145         sm := testSelectMaster()
146         if sm.IsMaster() {
147             master = sm
148         }
149     }
150     if master != nil {
151         master.Close()
152     }
153     time.Sleep(time.Second*10)
154 }

Guess you like

Origin blog.csdn.net/a159357445566/article/details/108550744