Once golang fasthttp stepped pit experience

A simple system is structured as follows:

Our service A receive external http request, then golang of fasthttp forwards the request to the service B, the process is very simple. After a period of operation online, the service B is completely found not receive any request to view the log service A, found in abundance following error

　　From the cause of the error to see because the connection was already occupied due. A container enters the service (service A and service B are started through the docker), -anlp view through the netstat, tpc found a large number of connections in the ESTABLISH. We use the long way connection, this time was very puzzled:. 1 fasthttp is capable of multiple connections, why would so many TCP connections, 2 why these connections can not be used, the above abnormal reasons. What is?

　　From time fasthttpclient source, we call request is forwarded using a

f.Client.DoTimeout (req, resp, f.ExecTimeout) , which is a f.Client fasthttp.HostClient, f.ExecTimeout is provided 5s. 
Tracing code until client.go in this method

func (c *HostClient) doNonNilReqResp(req *Request, resp *Response) (bool, error) {
	if req == nil {
		panic("BUG: req cannot be nil")
	}
	if resp == nil {
		panic("BUG: resp cannot be nil")
	}

	atomic.StoreUint32(&c.lastUseTime, uint32(time.Now().Unix()-startTimeUnix))

	// Free up resources occupied by response before sending the request,
	// so the GC may reclaim these resources (e.g. response body).
	resp.Reset()

	// If we detected a redirect to another schema
	if req.schemaUpdate {
		c.IsTLS = bytes.Equal(req.URI().Scheme(), strHTTPS)
		c.Addr = addMissingPort(string(req.Host()), c.IsTLS)
		c.addrIdx = 0
		c.addrs = nil
		req.schemaUpdate = false
		req.SetConnectionClose()
	}

	cc, err := c.acquireConn()
	if err != nil {
		return false, err
	}
	conn := cc.c

	resp.parseNetConn(conn)

	if c.WriteTimeout > 0 {
		// Set Deadline every time, since golang has fixed the performance issue
		// See https://github.com/golang/go/issues/15133#issuecomment-271571395 for details
		currentTime := time.Now()
		if err = conn.SetWriteDeadline(currentTime.Add(c.WriteTimeout)); err != nil {
			c.closeConn(cc)
			return true, err
		}
	}

	resetConnection := false
	if c.MaxConnDuration > 0 && time.Since(cc.createdTime) > c.MaxConnDuration && !req.ConnectionClose() {
		req.SetConnectionClose()
		resetConnection = true
	}

	userAgentOld := req.Header.UserAgent()
	if len(userAgentOld) == 0 {
		req.Header.userAgent = c.getClientName()
	}
	bw := c.acquireWriter(conn)
	err = req.Write(bw)

	if resetConnection {
		req.Header.ResetConnectionClose()
	}

	if err == nil {
		err = bw.Flush()
	}
	if err != nil {
		c.releaseWriter(bw)
		c.closeConn(cc)
		return true, err
	}
	c.releaseWriter(bw)

	if c.ReadTimeout > 0 {
		// Set Deadline every time, since golang has fixed the performance issue
		// See https://github.com/golang/go/issues/15133#issuecomment-271571395 for details
		currentTime := time.Now()
		if err = conn.SetReadDeadline(currentTime.Add(c.ReadTimeout)); err != nil {
			c.closeConn(cc)
			return true, err
		}
	}

	if !req.Header.IsGet() && req.Header.IsHead() {
		resp.SkipBody = true
	}
	if c.DisableHeaderNamesNormalizing {
		resp.Header.DisableNormalizing()
	}

	br := c.acquireReader(conn)
	if err = resp.ReadLimitBody(br, c.MaxResponseBodySize); err != nil {
		c.releaseReader(br)
		c.closeConn(cc)
		// Don't retry in case of ErrBodyTooLarge since we will just get the same again.
		retry := err != ErrBodyTooLarge
		return retry, err
	}
	c.releaseReader(br)

	if resetConnection || req.ConnectionClose() || resp.ConnectionClose() {
		c.closeConn(cc)
	} else {
		c.releaseConn(cc)
	}

	return false, err
}

　　Please note c.acquireConn () This method, this method to get a connection from the connection pool that is, if there is no available connections, create a new connection, the method is implemented as follows

func (c *HostClient) acquireConn() (*clientConn, error) {
	var cc *clientConn
	createConn := false
	startCleaner := false

	var n int
	c.connsLock.Lock()
	n = len(c.conns)
	if n == 0 {
		maxConns := c.MaxConns
		if maxConns <= 0 {
			maxConns = DefaultMaxConnsPerHost
		}
		if c.connsCount < maxConns {
			c.connsCount++
			createConn = true
			if !c.connsCleanerRun {
				startCleaner = true
				c.connsCleanerRun = true
			}
		}
	} else {
		n--
		cc = c.conns[n]
		c.conns[n] = nil
		c.conns = c.conns[:n]
	}
	c.connsLock.Unlock()

	if cc != nil {
		return cc, nil
	}
	if !createConn {
		return nil, ErrNoFreeConns
	}

	if startCleaner {
		go c.connsCleaner()
	}

	conn, err := c.dialHostHard()
	if err != nil {
		c.decConnsCount()
		return nil, err
	}
	cc = acquireClientConn(conn)

	return cc, nil
}

ErrNoFreeConns which is the errors.New ( "no free connections available to host"), the error is an error that appears in our service. That is because for obvious reasons! createConn, that is, can not create a new connection, why not create a new connection, because the number of connections has reached maxConns = DefaultMaxConnsPerHost = 512 (default value). It reached the maximum number of connections, but why not recovered and no connection multiplexing, from this perspective, still did not see it. And carefully checked business code, found that many service A to service B's request, because of the timeout ended, that reached f.ExecTimeout = 5s.

View source over again, and finally found a mystery.

func clientDoDeadline(req *Request, resp *Response, deadline time.Time, c clientDoer) error {
	timeout := -time.Since(deadline)
	if timeout <= 0 {
		return ErrTimeout
	}

	var ch chan error
	chv := errorChPool.Get()
	if chv == nil {
		chv = make(chan error, 1)
	}
	ch = chv.(chan error)

	// Make req and resp copies, since on timeout they no longer
	// may be accessed.
	reqCopy := AcquireRequest()
	req.copyToSkipBody(reqCopy)
	swapRequestBody(req, reqCopy)
	respCopy := AcquireResponse()
	if resp != nil {
		// Not calling resp.copyToSkipBody(respCopy) here to avoid
		// unexpected messing with headers
		respCopy.SkipBody = resp.SkipBody
	}

	// Note that the request continues execution on ErrTimeout until
	// client-specific ReadTimeout exceeds. This helps limiting load
	// on slow hosts by MaxConns* concurrent requests.
	//
	// Without this 'hack' the load on slow host could exceed MaxConns*
	// concurrent requests, since timed out requests on client side
	// usually continue execution on the host.

	var mu sync.Mutex
	var timedout bool
        //这个goroutine是用来处理连接以及发送请求的
	go func() {
		errDo := c.Do(reqCopy, respCopy)
		mu.Lock()
		{
			if !timedout {
				if resp != nil {
					respCopy.copyToSkipBody(resp)
					swapResponseBody(resp, respCopy)
				}
				swapRequestBody (reqCopy, REQ) 
				CH <- errDo 
			} 
		} 
		mu.Unlock () 

		ReleaseResponse (respCopy) 
		ReleaseRequest (reqCopy) 
	} () 
        // this content is used to handle the timeout 
	TC: = AcquireTimer (timeout) 
	var ERR error 
	SELECT { 
	Case ERR = <-CH: 
	Case <-tc.C: 
		mu.Lock () 
		{ 
			TimedOut to true = 
			ERR = ErrTimeout 
		} 
		mu.Unlock () 
	} 
	ReleaseTimer (TC) 

	SELECT { 
	Case <-CH: 
	default: 
	} 
	errorChPool. PUT (CHV) 

	return ERR 
}

　　We see how the timeout request is processed. When I request a timeout, the main flow directly returns a timeout error, but this time, goroutine there are still waiting to return to the request, but had B service, because of the situation will throw an exception, that is, no request for the return, leading to this link has not been released, and finally the reason why we have a large number of connections has been occupied leading to the absence of available connections.

　　Finally, when I was still Fufei Why fasthttp such a good framework there is such a problem, throw an exception if the server (request not to return) will be played in the connection? And that he looked at the code, the original,

// DoTimeout performs the given request and waits for response during
// the given timeout duration.
//
// Request must contain at least non-zero RequestURI with full url (including
// scheme and host) or non-zero Host header + RequestURI.
//
// The function doesn't follow redirects. Use Get* for following redirects.
//
// Response is ignored if resp is nil.
//
// ErrTimeout is returned if the response wasn't returned during
// the given timeout.
//
// ErrNoFreeConns is returned if all HostClient.MaxConns connections
// to the host are busy.
//
// It is recommended obtaining req and resp via AcquireRequest
// and AcquireResponse in performance-critical code.
//
// Warning: DoTimeout does not terminate the request itself. The request will
// continue in the background and the response will be discarded.
// If requests take too long and the connection pool gets filled up please
// try setting a ReadTimeout.
func (c *HostClient) DoTimeout(req *Request, resp *Response, timeout time.Duration) error {
	return clientDoTimeout(req, resp, timeout, c)
}

　　People comment this method already explained, to see the last paragraph of the comment to the effect that after a timeout, the request will still continue to wait for the return value, but the return value is discarded, if the request is too long, will connect the pool filled, just problems we encountered. To address the need to set ReadTimeout field, I understand the meaning of this field is that when the request is made, the time has not been reached ReadTimeout return value, the client will be disconnected (released).

　　These are the voice of experience, remember, when using fasthttp, plus ReadTimeout field.

Once golang fasthttp stepped pit experience

Guess you like