Analysis nsq message queue (b) parsing the source to the center

In the previous post analysis nsq message queue (a) Introduction and decentralized implementation principle , I introduced the nsq two ways, one is directly connected, to realize there is a decentralized manner by nslookup use, and probably said about implementation principle, there is nothing difficult to understand things, this post I nsqrealize decentralized source and one of the industry was a logical display for everyone to see it.

nsqd achieve communication and nsqlookupd

In previous start nsqdwhen I used the following command, I specified a parameter--lookupd-tcp-address

./nsqd -tcp-address ":8000"  -http-address ":8001" --lookupd-tcp-address=127.0.0.1:8200 --lookupd-tcp-address=127.0.0.1:7200 -data-path=./a

--lookupd-tcp-addressIt is used to specify nsqlookupdthe tcplisten address.

nsqdAnd nsqlookupda communication exchange is simply the FIG so

nsqdAfter starting the connection nsqlookupd, the connection is successful, you want to send a magic identification nsq.MagicV1, identify what's this magic Why, of course not, he just used to designate, both the use of information and communication version of the client and the server, not the version has a different approach in order to make the latter a new version messaging to facilitate it.
nsqlookupdCode block

func (p *tcpServer) Handle(clientConn net.Conn) {   
    // ...
    buf := make([]byte, 4)
    _, err := io.ReadFull(clientConn, buf)
    // ...
    protocolMagic := string(buf)
    // ...
    var prot protocol.Protocol
    switch protocolMagic {
    case "  V1":
        prot = &LookupProtocolV1{ctx: p.ctx}
    default:
        // ...
        return
    }
    err = prot.IOLoop(clientConn)
    //...
}

This time nsqdhas already nsqlookupdestablished a good connection, but this time, the only description They connection is successful.
nsqlookupdAlso did not make the link added to the available nsqdlist.
After establishing the connection is completed, nsqdit will send IDENTIFYbasic information command, which contains the nsq of
nsqdcode

        ci := make(map[string]interface{})
        ci["version"] = version.Binary
        ci["tcp_port"] = n.RealTCPAddr().Port
        ci["http_port"] = n.RealHTTPAddr().Port
        ci["hostname"] = hostname
        ci["broadcast_address"] = n.getOpts().BroadcastAddress

        cmd, err := nsq.Identify(ci)
        if err != nil {
            lp.Close()
            return
        }
        resp, err := lp.Command(cmd)

Contain nsqdprovided tcpand httpports, host name, version, etc., sent to nsqlookupd, nsqlookupdreceived IDENTIFYthe command, and then added to the analysis information nsqdis in the list of available
nsqlookupdcode block

func (p *LookupProtocolV1) IDENTIFY(client *ClientV1, reader *bufio.Reader, params []string) ([]byte, error) {
    var err error
    if client.peerInfo != nil {
        return nil, protocol.NewFatalClientErr(err, "E_INVALID", "cannot IDENTIFY again")
    }
    var bodyLen int32
    err = binary.Read(reader, binary.BigEndian, &bodyLen)
    // ...
    body := make([]byte, bodyLen)
    _, err = io.ReadFull(reader, body)
    // ...  
    peerInfo := PeerInfo{id: client.RemoteAddr().String()}
    err = json.Unmarshal(body, &peerInfo)
    // ...
    client.peerInfo = &peerInfo
    // 把nsqd的连接加入到可用列表里    
    if p.ctx.nsqlookupd.DB.AddProducer(Registration{"client", "", ""}, &Producer{peerInfo: client.peerInfo}) {
        p.ctx.nsqlookupd.logf(LOG_INFO, "DB: client(%s) REGISTER category:%s key:%s subkey:%s", client, "client", "", "")
    }
    // ...
    return response, nil
}

Then every 15 seconds, sends a PINGheartbeat to the command nsqlookupd, so to maintain a viable state, nsqlookupdeach time it receives sent me PINGthe command will write down the nsqdlast update time, such as a filter condition, if not updated for a long time , it considers the node in question, this information will not be added to the list of available nodes.
So far, a nsqdput their information registered to nsqlookupdthe list of available, we can start more nsqdand more nsqlookupd, to nsqd
specify more nsqlookupd, just as my previous post as written

--lookupd-tcp-address=127.0.0.1:8200 --lookupd-tcp-address=127.0.0.1:7200

nsqdAnd all the nsqlookupdconnection is established, the service registration information, and keep the heart rate, to ensure that the list of available updates.

nsqlookupd hang of handling

Above we said that nsqdif problems arise, nsqlookupdthe nsqdlist will be used to dispose of in this connection information. As nsqlookupdhung up how to do it

the current approach is that,
whether it is the heartbeat, or other commands nsqdwill all nsqlookupsend a message, when nsqddiscovered nsqlookupdwhen a problem occurs, in each time you send a command, it will continue to reconnect:

func (lp *lookupPeer) Command(cmd *nsq.Command) ([]byte, error) {
    initialState := lp.state
    if lp.state != stateConnected {
        err := lp.Connect()
        if err != nil {
            return nil, err
        }
        lp.state = stateConnected
        _, err = lp.Write(nsq.MagicV1)
        if err != nil {
            lp.Close()
            return nil, err
        }
        if initialState == stateDisconnected {
            lp.connectCallback(lp)
        }
        if lp.state != stateConnected {
            return nil, fmt.Errorf("lookupPeer connectCallback() failed")
        }
    }
    // ...
}

If the connection is successful, it will be called again connectCallbackmethod, IDENTIFYinvoke the command and so on.

Client and nsqlookupd, nsqd communication implemented

Previous posts in the introduction, and how to connect clients nsqlookupdto communicate

    adds := []string{"127.0.0.1:7201", "127.0.0.1:8201"}
    config := nsq.NewConfig()
    config.MaxInFlight = 1000
    config.MaxBackoffDuration = 5 * time.Second
    config.DialTimeout = 10 * time.Second

    topicName := "testTopic1"
    c, _ := nsq.NewConsumer(topicName, "ch1", config)
    testHandler := &MyTestHandler{consumer: c}

    c.AddHandler(testHandler)
    if err := c.ConnectToNSQLookupds(adds); err != nil {
        panic(err)
    }

It should be noted addsin the address of the port, is nsqlookupdthe httpport
where I also use a post on the map, give you a detailed analysis of

the calling method c.ConnectToNSQLookupds(adds), is to achieve his visit nsqlookupdhttp port http://127.0.0.1:7201/lookup?topic=testTopic1to get offers consumersubscription topicall producersnode information, the data returned by url The message is below.

{
  "channels": [
    "nsq_to_file",
    "ch1"
  ],
  "producers": [
    {
      "remote_address": "127.0.0.1:58606",
      "hostname": "li-peng-mc-macbook.local",
      "broadcast_address": "li-peng-mc-macbook.local",
      "tcp_port": 8000,
      "http_port": 8001,
      "version": "1.1.1-alpha"
    },
    {
      "remote_address": "127.0.0.1:58627",
      "hostname": "li-peng-mc-macbook.local",
      "broadcast_address": "li-peng-mc-macbook.local",
      "tcp_port": 7000,
      "http_port": 7001,
      "version": "1.1.1-alpha"
    }
  ]
}


The method of queryLookupdoperation is performed on the FIG.

  • Subscribe to receive offers of topicthe nsqdlist
  • Connection
func (r *Consumer) queryLookupd() {
    retries := 0
retry:
    endpoint := r.nextLookupdEndpoint()

    // ...  
    err := apiRequestNegotiateV1("GET", endpoint, nil, &data)
    if err != nil {
        // ...
    }
    var nsqdAddrs []string
    for _, producer := range data.Producers {
        broadcastAddress := producer.BroadcastAddress
        port := producer.TCPPort
        joined := net.JoinHostPort(broadcastAddress, strconv.Itoa(port))
        nsqdAddrs = append(nsqdAddrs, joined)
    }
    // 进行连接
    for _, addr := range nsqdAddrs {
        err = r.ConnectToNSQD(addr)
        if err != nil && err != ErrAlreadyConnected {
            r.log(LogLevelError, "(%s) error connecting to nsqd - %s", addr, err)
            continue
        }
    }
}

How to refresh the list of available nsqd

New nsqd added, is how to deal with it?
In the call ConnectToNSQLookupdit will start a coroutine when go r.lookupdLoop()calling a method lookupdLoopof timed loop access queryLookupdupdate nsqdthe list of available

// poll all known lookup servers every LookupdPollInterval
func (r *Consumer) lookupdLoop() {
    // ...
    var ticker *time.Ticker
    select {
    case <-time.After(jitter):
    case <-r.exitChan:
        goto exit
    }
    // 设置Interval 来循环访问 queryLookupd
    ticker = time.NewTicker(r.config.LookupdPollInterval)
    for {
        select {
        case <-ticker.C:
            r.queryLookupd()
        case <-r.lookupdRecheckChan:
            r.queryLookupd()
        case <-r.exitChan:
            goto exit
        }
    }

exit:
    // ...
}

The process nsqd single point of failure


When there is nsqda fault occurs how do? The current approach is

  • nsqdlookupdThis will remove the failed node from the available list, the client received from the list of available interfaces is always available.
  • The client failed node will be removed from the available nodes, and then determines whether to go to the use of nsqlookupthe connection is made and if so, case r.lookupdRecheckChan <- 1to refresh the list of available queryLookupd, if not, then a timing to start a coroutine retry the connection made, if the fault recovery, the connection is successful, will be re-added to the available list.
    Code client implementation
func (r *Consumer) onConnClose(c *Conn) {
    // ...
    // remove this connections RDY count from the consumer's total
    delete(r.connections, c.String())
    left := len(r.connections)
    // ...
    r.mtx.RLock()
    numLookupd := len(r.lookupdHTTPAddrs)
    reconnect := indexOf(c.String(), r.nsqdTCPAddrs) >= 0
    // 如果使用的是nslookup则去刷新可用列表
    if numLookupd > 0 {
        // trigger a poll of the lookupd
        select {
        case r.lookupdRecheckChan <- 1:
        default:
        }
    } else if reconnect {
        // ... 
        }(c.String())
    }
}

Guess you like

Origin www.cnblogs.com/li-peng/p/11540949.html