In the previous post analysis nsq message queue (a) Introduction and decentralized implementation principle , I introduced the nsq two ways, one is directly connected, to realize there is a decentralized manner by nslookup use, and probably said about implementation principle, there is nothing difficult to understand things, this post I nsq
realize decentralized source and one of the industry was a logical display for everyone to see it.
nsqd achieve communication and nsqlookupd
In previous start nsqd
when I used the following command, I specified a parameter--lookupd-tcp-address
./nsqd -tcp-address ":8000" -http-address ":8001" --lookupd-tcp-address=127.0.0.1:8200 --lookupd-tcp-address=127.0.0.1:7200 -data-path=./a
--lookupd-tcp-address
It is used to specify nsqlookupd
the tcp
listen address.
nsqd
And nsqlookupd
a communication exchange is simply the FIG so
nsqd
After starting the connection nsqlookupd
, the connection is successful, you want to send a magic identification nsq.MagicV1
, identify what's this magic Why, of course not, he just used to designate, both the use of information and communication version of the client and the server, not the version has a different approach in order to make the latter a new version messaging to facilitate it.
nsqlookupd
Code block
func (p *tcpServer) Handle(clientConn net.Conn) {
// ...
buf := make([]byte, 4)
_, err := io.ReadFull(clientConn, buf)
// ...
protocolMagic := string(buf)
// ...
var prot protocol.Protocol
switch protocolMagic {
case " V1":
prot = &LookupProtocolV1{ctx: p.ctx}
default:
// ...
return
}
err = prot.IOLoop(clientConn)
//...
}
This time nsqd
has already nsqlookupd
established a good connection, but this time, the only description They connection is successful.
nsqlookupd
Also did not make the link added to the available nsqd
list.
After establishing the connection is completed, nsqd
it will send IDENTIFY
basic information command, which contains the nsq of
nsqd
code
ci := make(map[string]interface{})
ci["version"] = version.Binary
ci["tcp_port"] = n.RealTCPAddr().Port
ci["http_port"] = n.RealHTTPAddr().Port
ci["hostname"] = hostname
ci["broadcast_address"] = n.getOpts().BroadcastAddress
cmd, err := nsq.Identify(ci)
if err != nil {
lp.Close()
return
}
resp, err := lp.Command(cmd)
Contain nsqd
provided tcp
and http
ports, host name, version, etc., sent to nsqlookupd
, nsqlookupd
received IDENTIFY
the command, and then added to the analysis information nsqd
is in the list of available
nsqlookupd
code block
func (p *LookupProtocolV1) IDENTIFY(client *ClientV1, reader *bufio.Reader, params []string) ([]byte, error) {
var err error
if client.peerInfo != nil {
return nil, protocol.NewFatalClientErr(err, "E_INVALID", "cannot IDENTIFY again")
}
var bodyLen int32
err = binary.Read(reader, binary.BigEndian, &bodyLen)
// ...
body := make([]byte, bodyLen)
_, err = io.ReadFull(reader, body)
// ...
peerInfo := PeerInfo{id: client.RemoteAddr().String()}
err = json.Unmarshal(body, &peerInfo)
// ...
client.peerInfo = &peerInfo
// 把nsqd的连接加入到可用列表里
if p.ctx.nsqlookupd.DB.AddProducer(Registration{"client", "", ""}, &Producer{peerInfo: client.peerInfo}) {
p.ctx.nsqlookupd.logf(LOG_INFO, "DB: client(%s) REGISTER category:%s key:%s subkey:%s", client, "client", "", "")
}
// ...
return response, nil
}
Then every 15 seconds, sends a PING
heartbeat to the command nsqlookupd
, so to maintain a viable state, nsqlookupd
each time it receives sent me PING
the command will write down the nsqd
last update time, such as a filter condition, if not updated for a long time , it considers the node in question, this information will not be added to the list of available nodes.
So far, a nsqd
put their information registered to nsqlookupd
the list of available, we can start more nsqd
and more nsqlookupd
, to nsqd
specify more nsqlookupd
, just as my previous post as written
--lookupd-tcp-address=127.0.0.1:8200 --lookupd-tcp-address=127.0.0.1:7200
nsqd
And all the nsqlookupd
connection is established, the service registration information, and keep the heart rate, to ensure that the list of available updates.
nsqlookupd hang of handling
Above we said that nsqd
if problems arise, nsqlookupd
the nsqd
list will be used to dispose of in this connection information. As nsqlookupd
hung up how to do it
the current approach is that,
whether it is the heartbeat, or other commands nsqd
will all nsqlookup
send a message, when nsqd
discovered nsqlookupd
when a problem occurs, in each time you send a command, it will continue to reconnect:
func (lp *lookupPeer) Command(cmd *nsq.Command) ([]byte, error) {
initialState := lp.state
if lp.state != stateConnected {
err := lp.Connect()
if err != nil {
return nil, err
}
lp.state = stateConnected
_, err = lp.Write(nsq.MagicV1)
if err != nil {
lp.Close()
return nil, err
}
if initialState == stateDisconnected {
lp.connectCallback(lp)
}
if lp.state != stateConnected {
return nil, fmt.Errorf("lookupPeer connectCallback() failed")
}
}
// ...
}
If the connection is successful, it will be called again connectCallback
method, IDENTIFY
invoke the command and so on.
Client and nsqlookupd, nsqd communication implemented
Previous posts in the introduction, and how to connect clients nsqlookupd
to communicate
adds := []string{"127.0.0.1:7201", "127.0.0.1:8201"}
config := nsq.NewConfig()
config.MaxInFlight = 1000
config.MaxBackoffDuration = 5 * time.Second
config.DialTimeout = 10 * time.Second
topicName := "testTopic1"
c, _ := nsq.NewConsumer(topicName, "ch1", config)
testHandler := &MyTestHandler{consumer: c}
c.AddHandler(testHandler)
if err := c.ConnectToNSQLookupds(adds); err != nil {
panic(err)
}
It should be noted adds
in the address of the port, is nsqlookupd
the http
port
where I also use a post on the map, give you a detailed analysis of
the calling method c.ConnectToNSQLookupds(adds)
, is to achieve his visit nsqlookupd
http port http://127.0.0.1:7201/lookup?topic=testTopic1
to get offers consumer
subscription topic
all producers
node information, the data returned by url The message is below.
{
"channels": [
"nsq_to_file",
"ch1"
],
"producers": [
{
"remote_address": "127.0.0.1:58606",
"hostname": "li-peng-mc-macbook.local",
"broadcast_address": "li-peng-mc-macbook.local",
"tcp_port": 8000,
"http_port": 8001,
"version": "1.1.1-alpha"
},
{
"remote_address": "127.0.0.1:58627",
"hostname": "li-peng-mc-macbook.local",
"broadcast_address": "li-peng-mc-macbook.local",
"tcp_port": 7000,
"http_port": 7001,
"version": "1.1.1-alpha"
}
]
}
The method of queryLookupd
operation is performed on the FIG.
- Subscribe to receive offers of
topic
thensqd
list - Connection
func (r *Consumer) queryLookupd() {
retries := 0
retry:
endpoint := r.nextLookupdEndpoint()
// ...
err := apiRequestNegotiateV1("GET", endpoint, nil, &data)
if err != nil {
// ...
}
var nsqdAddrs []string
for _, producer := range data.Producers {
broadcastAddress := producer.BroadcastAddress
port := producer.TCPPort
joined := net.JoinHostPort(broadcastAddress, strconv.Itoa(port))
nsqdAddrs = append(nsqdAddrs, joined)
}
// 进行连接
for _, addr := range nsqdAddrs {
err = r.ConnectToNSQD(addr)
if err != nil && err != ErrAlreadyConnected {
r.log(LogLevelError, "(%s) error connecting to nsqd - %s", addr, err)
continue
}
}
}
How to refresh the list of available nsqd
New nsqd added, is how to deal with it?
In the call ConnectToNSQLookupd
it will start a coroutine when go r.lookupdLoop()
calling a method lookupdLoop
of timed loop access queryLookupd
update nsqd
the list of available
// poll all known lookup servers every LookupdPollInterval
func (r *Consumer) lookupdLoop() {
// ...
var ticker *time.Ticker
select {
case <-time.After(jitter):
case <-r.exitChan:
goto exit
}
// 设置Interval 来循环访问 queryLookupd
ticker = time.NewTicker(r.config.LookupdPollInterval)
for {
select {
case <-ticker.C:
r.queryLookupd()
case <-r.lookupdRecheckChan:
r.queryLookupd()
case <-r.exitChan:
goto exit
}
}
exit:
// ...
}
The process nsqd single point of failure
When there is nsqd
a fault occurs how do? The current approach is
nsqdlookupd
This will remove the failed node from the available list, the client received from the list of available interfaces is always available.- The client failed node will be removed from the available nodes, and then determines whether to go to the use of
nsqlookup
the connection is made and if so,case r.lookupdRecheckChan <- 1
to refresh the list of availablequeryLookupd
, if not, then a timing to start a coroutine retry the connection made, if the fault recovery, the connection is successful, will be re-added to the available list.
Code client implementation
func (r *Consumer) onConnClose(c *Conn) {
// ...
// remove this connections RDY count from the consumer's total
delete(r.connections, c.String())
left := len(r.connections)
// ...
r.mtx.RLock()
numLookupd := len(r.lookupdHTTPAddrs)
reconnect := indexOf(c.String(), r.nsqdTCPAddrs) >= 0
// 如果使用的是nslookup则去刷新可用列表
if numLookupd > 0 {
// trigger a poll of the lookupd
select {
case r.lookupdRecheckChan <- 1:
default:
}
} else if reconnect {
// ...
}(c.String())
}
}