In-depth understanding of HttpClient connection pool

Original: https://www.jb51.net/article/141015.htm

Table of contents

1. Background

2. Keep-Alive of HTTP/1.0+

3. Persistent connection of HTTP/1.1

4. How does HttpClient generate a persistent connection

4.1 Implementation of HttpClient connection pool

4.2 Future

4.3 HttpClientConnection

5. How does HttpClient reuse persistent connections?

 6. How does HttpClient clean up expired connections

7. Summary of this article

Eight, httpcleint thread pool important settings


1. Background

The HTTP protocol is a stateless protocol, that is, each request is independent of each other. Therefore, its initial implementation is that each http request will open a tcp socket connection, and the connection will be closed when the interaction is completed.

The HTTP protocol is a full-duplex protocol, so establishing and disconnecting requires three handshakes and four waved hands. Obviously, in this design, every time an Http request is sent, a lot of additional resources will be consumed, that is, connection establishment and destruction.

Therefore, the HTTP protocol has also been developed, and the socket connection multiplexing is performed through the persistent connection method.

As can be seen from the figure:

  • On a serial connection, each interaction opens and closes the connection
  • In a persistent connection, the first interaction will open the connection, and the connection will not be closed after the interaction. The next interaction will save the process of establishing a connection.

There are two implementations of persistent connections: HTTP/1.0+ keep-alive and HTTP/1.1 persistent connections.

2. Keep-Alive of HTTP/1.0+

Since 1996, many HTTP/1.0 browsers and servers have extended the protocol, which is the "keep-alive" extension protocol.

Note that this extension protocol appears as an "experimental persistent connection" supplement to 1.0. keep-alive is no longer used, and it is not explained in the latest HTTP/1.1 specification, but many applications continue.

Clients using HTTP/1.0 add "Connection: Keep-Alive" to the header, requesting the server to keep a connection open. The server will include the same header in the response if it wishes to keep the connection open. If the response does not contain the "Connection: Keep-Alive" header, the client will think that the server does not support keep-alive, and will close the current connection after sending the response message.

Through the keep-alive supplementary protocol, the persistent connection between the client and the server is completed, but there are still some problems:

  • Keep-alive is not a standard protocol in HTTP/1.0, and the client must send Connection:Keep-Alive to activate the keep-alive connection.
  • Proxy servers may not support keep-alive, because some proxies are "blind relays", unable to understand the meaning of the header, and just forward the header hop by hop. Therefore, it may cause the connection between the client and the server, but the proxy does not accept the data on the connection.

3. Persistent connection of HTTP/1.1

HTTP/1.1 replaces Keep-Alive with a persistent connection.

HTTP/1.1 connections are persistent by default. If you want to close it explicitly, you need to add the Connection:Close header to the message. That is, in HTTP/1.1, all connections are multiplexed.

However, like Keep-Alive, idle persistent connections can also be closed by the client and server at any time. Not sending Connection:Close does not mean that the server promises to keep the connection open forever.

4. How does HttpClient generate a persistent connection

HttpClien uses a connection pool to manage held connections, and connections can be reused on the same TCP link. HttpClient implements connection persistence through connection pooling.

In fact, the "pool" technology is a general design, and its design idea is not complicated:

  • Establish a connection when a connection is used for the first time
  • At the end, the corresponding connection is not closed and returned to the pool
  • The next connection for the same purpose can get an available connection from the pool
  • Periodically clean up expired connections

All connection pools are based on this idea, but when we look at the HttpClient source code, we mainly focus on two points:

  • The specific design scheme of the connection pool, for reference in customizing the connection pool in the future
  • How to correspond to the HTTP protocol, that is, the realization of theoretical abstraction into code

4.1 Implementation of HttpClient connection pool

HttpClient's handling of persistent connections can be concentrated in the following code. The part related to the connection pool is extracted from MainClientExec, and other parts are removed:

public class MainClientExec implements ClientExecChain {

 @Override
 public CloseableHttpResponse execute(
  final HttpRoute route,
  final HttpRequestWrapper request,
  final HttpClientContext context,
  final HttpExecutionAware execAware) throws IOException, HttpException {
     //从连接管理器HttpClientConnectionManager中获取一个连接请求ConnectionRequest
 final ConnectionRequest connRequest = connManager.requestConnection(route, userToken);final HttpClientConnection managedConn;
 final int timeout = config.getConnectionRequestTimeout(); //从连接请求ConnectionRequest中获取一个被管理的连接HttpClientConnection
 managedConn = connRequest.get(timeout > 0 ? timeout : 0, TimeUnit.MILLISECONDS);
     //将连接管理器HttpClientConnectionManager与被管理的连接HttpClientConnection交给一个ConnectionHolder持有
 final ConnectionHolder connHolder = new ConnectionHolder(this.log, this.connManager, managedConn);
 try {
  HttpResponse response;
  if (!managedConn.isOpen()) {          //如果当前被管理的连接不是出于打开状态,需要重新建立连接
  establishRoute(proxyAuthState, managedConn, route, request, context);
  }
       //通过连接HttpClientConnection发送请求
  response = requestExecutor.execute(request, managedConn, context);
       //通过连接重用策略判断是否连接可重用  
  if (reuseStrategy.keepAlive(response, context)) {
  //获得连接有效期
  final long duration = keepAliveStrategy.getKeepAliveDuration(response, context);
  //设置连接有效期
  connHolder.setValidFor(duration, TimeUnit.MILLISECONDS);          //将当前连接标记为可重用状态
  connHolder.markReusable();
  } else {
  connHolder.markNonReusable();
  }
 }
 final HttpEntity entity = response.getEntity();
 if (entity == null || !entity.isStreaming()) {
  //将当前连接释放到池中,供下次调用
  connHolder.releaseConnection();
  return new HttpResponseProxy(response, null);
 } else {
  return new HttpResponseProxy(response, connHolder);
 }
}

Here we see that the processing of the connection during the Http request process is consistent with the protocol specification, and here we will expand on the specific implementation.

PoolingHttpClientConnectionManager is the default connection manager of HttpClient. First, get a connection request through requestConnection(). Note that this is not a connection.

public ConnectionRequest requestConnection(
  final HttpRoute route,
  final Object state) {final Future<CPoolEntry> future = this.pool.lease(route, state, null);
 return new ConnectionRequest() {
  @Override
  public boolean cancel() {
  return future.cancel(true);
  }
  @Override
  public HttpClientConnection get(
   final long timeout,
   final TimeUnit tunit) throws InterruptedException, ExecutionException, ConnectionPoolTimeoutException {
  final HttpClientConnection conn = leaseConnection(future, timeout, tunit);
  if (conn.isOpen()) {
   final HttpHost host;
   if (route.getProxyHost() != null) {
   host = route.getProxyHost();
   } else {
   host = route.getTargetHost();
   }
   final SocketConfig socketConfig = resolveSocketConfig(host);
   conn.setSocketTimeout(socketConfig.getSoTimeout());
  }
  return conn;
  }
 };
 }

It can be seen that the returned ConnectionRequest object is actually a holding Future<CPoolEntry>, and CPoolEntry is the real connection instance managed by the connection pool.

From the above code we should focus on:

Future<CPoolEntry> future = this.pool.lease(route, state, null)

  How to get an asynchronous connection from the connection pool CPool, Future<CPoolEntry>

HttpClientConnection conn = leaseConnection(future, timeout, tunit)

  How to get a real connection HttpClientConnection through the asynchronous connection Future<CPoolEntry>

4.2 Future<CPoolEntry>

Take a look at how CPool releases a Future<CPoolEntry>, the core code of AbstractConnPool is as follows:

private E getPoolEntryBlocking(
  final T route, final Object state,
  final long timeout, final TimeUnit tunit,
  final Future<E> future) throws IOException, InterruptedException, TimeoutException {
     //首先对当前连接池加锁,当前锁是可重入锁ReentrantLockthis.lock.lock();
 try {        //获得一个当前HttpRoute对应的连接池,对于HttpClient的连接池而言,总池有个大小,每个route对应的连接也是个池,所以是“池中池”
  final RouteSpecificPool<T, C, E> pool = getPool(route);
  E entry;
  for (;;) {
  Asserts.check(!this.isShutDown, "Connection pool shut down");          //死循环获得连接
  for (;;) {            //从route对应的池中拿连接,可能是null,也可能是有效连接
   entry = pool.getFree(state);            //如果拿到null,就退出循环
   if (entry == null) {
   break;
   }            //如果拿到过期连接或者已关闭连接,就释放资源,继续循环获取
   if (entry.isExpired(System.currentTimeMillis())) {
   entry.close();
   }
   if (entry.isClosed()) {
   this.available.remove(entry);
   pool.free(entry, false);
   } else {              //如果拿到有效连接就退出循环
   break;
   }
  }          //拿到有效连接就退出
  if (entry != null) {
   this.available.remove(entry);
   this.leased.add(entry);
   onReuse(entry);
   return entry;
  }
          //到这里证明没有拿到有效连接,需要自己生成一个  
  final int maxPerRoute = getMax(route);
  //每个route对应的连接最大数量是可配置的,如果超过了,就需要通过LRU清理掉一些连接
  final int excess = Math.max(0, pool.getAllocatedCount() + 1 - maxPerRoute);
  if (excess > 0) {
   for (int i = 0; i < excess; i++) {
   final E lastUsed = pool.getLastUsed();
   if (lastUsed == null) {
    break;
   }
   lastUsed.close();
   this.available.remove(lastUsed);
   pool.remove(lastUsed);
   }
  }
          //当前route池中的连接数,没有达到上线
  if (pool.getAllocatedCount() < maxPerRoute) {
   final int totalUsed = this.leased.size();
   final int freeCapacity = Math.max(this.maxTotal - totalUsed, 0);            //判断连接池是否超过上线,如果超过了,需要通过LRU清理掉一些连接
   if (freeCapacity > 0) {
   final int totalAvailable = this.available.size();               //如果空闲连接数已经大于剩余可用空间,则需要清理下空闲连接
   if (totalAvailable > freeCapacity - 1) {
    if (!this.available.isEmpty()) {
    final E lastUsed = this.available.removeLast();
    lastUsed.close();
    final RouteSpecificPool<T, C, E> otherpool = getPool(lastUsed.getRoute());
    otherpool.remove(lastUsed);
    }
   }              //根据route建立一个连接
   final C conn = this.connFactory.create(route);              //将这个连接放入route对应的“小池”中
   entry = pool.add(conn);              //将这个连接放入“大池”中
   this.leased.add(entry);
   return entry;
   }
  }
         //到这里证明没有从获得route池中获得有效连接,并且想要自己建立连接时当前route连接池已经到达最大值,即已经有连接在使用,但是对当前线程不可用
  boolean success = false;
  try {
   if (future.isCancelled()) {
   throw new InterruptedException("Operation interrupted");
   }            //将future放入route池中等待
   pool.queue(future);            //将future放入大连接池中等待
   this.pending.add(future);            //如果等待到了信号量的通知,success为true
   if (deadline != null) {
   success = this.condition.awaitUntil(deadline);
   } else {
   this.condition.await();
   success = true;
   }
   if (future.isCancelled()) {
   throw new InterruptedException("Operation interrupted");
   }
  } finally {
   //从等待队列中移除
   pool.unqueue(future);
   this.pending.remove(future);
  }
  //如果没有等到信号量通知并且当前时间已经超时,则退出循环
  if (!success && (deadline != null && deadline.getTime() <= System.currentTimeMillis())) {
   break;
  }
  }       //最终也没有等到信号量通知,没有拿到可用连接,则抛异常
  throw new TimeoutException("Timeout waiting for connection");
 } finally {       //释放对大连接池的锁
  this.lock.unlock();
 }
 }

There are several important points in the above code logic:

  • The connection pool has a maximum number of connections, each route corresponds to a small connection pool, and there is also a maximum number of connections
  • Whether it is a large connection pool or a small connection pool, when the number exceeds, some connections must be released through LRU
  • If you get an available connection, return it to the upper layer for use
  • If no available connection is obtained, HttpClient will judge whether the current route connection pool has exceeded the maximum number, and if the upper limit is not reached, a new connection will be created and put into the pool
  • If the upper limit is reached, wait in line, wait for the semaphore, obtain it again, and throw a timeout exception if the wait is not reached
  • Obtaining a connection through the thread pool needs to be locked through ReetrantLock to ensure thread safety

So far, the program has obtained an available CPoolEntry instance, or the program is terminated by throwing an exception.

4.3 HttpClientConnection

protected HttpClientConnection leaseConnection(
  final Future<CPoolEntry> future,
  final long timeout,
  final TimeUnit tunit) throws InterruptedException, ExecutionException, ConnectionPoolTimeoutException {
 final CPoolEntry entry;
 try {       //从异步操作Future<CPoolEntry>中获得CPoolEntry
  entry = future.get(timeout, tunit);
  if (entry == null || future.isCancelled()) {
  throw new InterruptedException();
  }
  Asserts.check(entry.getConnection() != null, "Pool entry with no connection");
  if (this.log.isDebugEnabled()) {
  this.log.debug("Connection leased: " + format(entry) + formatStats(entry.getRoute()));
  }       //获得一个CPoolEntry的代理对象,对其操作都是使用同一个底层的HttpClientConnection
  return CPoolProxy.newProxy(entry);
 } catch (final TimeoutException ex) {
  throw new ConnectionPoolTimeoutException("Timeout waiting for connection from pool");
 }
 }

5. How does HttpClient reuse persistent connections?

In the previous chapter, we saw that HttpClient obtains a connection through a connection pool, and obtains it from the pool when it needs to use a connection.

Corresponding to the questions in Chapter 3:

  • Establish a connection when a connection is used for the first time
  • At the end, the corresponding connection is not closed and returned to the pool
  • The next connection for the same purpose can get an available connection from the pool
  • Periodically clean up expired connections

We saw in Chapter 4 how HttpClient handles problems 1 and 3, so how does it handle the second problem?

That is, how does HttpClient determine whether a connection should be closed after use, or should it be put into the pool for reuse by others? Look at the code of MainClientExec again

//发送Http连接  response = requestExecutor.execute(request, managedConn, context);
  //根据重用策略判断当前连接是否要复用
  if (reuseStrategy.keepAlive(response, context)) {
   //需要复用的连接,获取连接超时时间,以response中的timeout为准
   final long duration = keepAliveStrategy.getKeepAliveDuration(response, context);
   if (this.log.isDebugEnabled()) {
   final String s;               //timeout的是毫秒数,如果没有设置则为-1,即没有超时时间
   if (duration > 0) {
    s = "for " + duration + " " + TimeUnit.MILLISECONDS;
   } else {
    s = "indefinitely";
   }
   this.log.debug("Connection can be kept alive " + s);
   }            //设置超时时间,当请求结束时连接管理器会根据超时时间决定是关闭还是放回到池中
   connHolder.setValidFor(duration, TimeUnit.MILLISECONDS);
   //将连接标记为可重用            connHolder.markReusable();
  } else {            //将连接标记为不可重用
   connHolder.markNonReusable();
  }

It can be seen that when a request occurs using a connection, there is a connection retry strategy to determine whether the connection should be reused. If it is to be reused, it will be handed over to HttpClientConnectionManager and put into the pool after the end.

So what is the logic of the connection reuse strategy?

public class DefaultClientConnectionReuseStrategy extends DefaultConnectionReuseStrategy {

 public static final DefaultClientConnectionReuseStrategy INSTANCE = new DefaultClientConnectionReuseStrategy();

 @Override
 public boolean keepAlive(final HttpResponse response, final HttpContext context) {
     //从上下文中拿到request
  final HttpRequest request = (HttpRequest) context.getAttribute(HttpCoreContext.HTTP_REQUEST);
  if (request != null) {       //获得Connection的Header
   final Header[] connHeaders = request.getHeaders(HttpHeaders.CONNECTION);
   if (connHeaders.length != 0) {
    final TokenIterator ti = new BasicTokenIterator(new BasicHeaderIterator(connHeaders, null));
    while (ti.hasNext()) {
     final String token = ti.nextToken();            //如果包含Connection:Close首部,则代表请求不打算保持连接,会忽略response的意愿,该头部这是HTTP/1.1的规范
     if (HTTP.CONN_CLOSE.equalsIgnoreCase(token)) {
      return false;
     }
    }
   }
  }     //使用父类的的复用策略
  return super.keepAlive(response, context);
 }
}

Look at the reuse strategy of the parent class

if (canResponseHaveBody(request, response)) {
    final Header[] clhs = response.getHeaders(HTTP.CONTENT_LEN);
    //如果reponse的Content-Length没有正确设置,则不复用连接          //因为对于持久化连接,两次传输之间不需要重新建立连接,则需要根据Content-Length确认内容属于哪次请求,以正确处理“粘包”现象    //所以,没有正确设置Content-Length的response连接不能复用
    if (clhs.length == 1) {
     final Header clh = clhs[0];
     try {
      final int contentLen = Integer.parseInt(clh.getValue());
      if (contentLen < 0) {
       return false;
      }
     } catch (final NumberFormatException ex) {
      return false;
     }
    } else {
     return false;
    }
   }
  if (headerIterator.hasNext()) {
   try {
    final TokenIterator ti = new BasicTokenIterator(headerIterator);
    boolean keepalive = false;
    while (ti.hasNext()) {
     final String token = ti.nextToken();            //如果response有Connection:Close首部,则明确表示要关闭,则不复用
     if (HTTP.CONN_CLOSE.equalsIgnoreCase(token)) {
      return false;            //如果response有Connection:Keep-Alive首部,则明确表示要持久化,则复用
     } else if (HTTP.CONN_KEEP_ALIVE.equalsIgnoreCase(token)) {
      keepalive = true;
     }
    }
    if (keepalive) {
     return true;
    }
   } catch (final ParseException px) {
    return false;
   }
  }
     //如果response中没有相关的Connection首部说明,则高于HTTP/1.0版本的都复用连接 
  return !ver.lessEquals(HttpVersion.HTTP_1_0);

in conclusion:

  • If the request header contains Connection:Close, it will not be reused
  • If the Content-Length in the response is set incorrectly, it will not be reused
  • If the response header contains Connection:Close, it will not be reused
  • If the response header contains Connection: Keep-Alive, reuse
  • In the case of no hit, if the HTTP version is higher than 1.0, it will be reused

As can be seen from the code, its implementation strategy is consistent with the constraints of our protocol layer in Chapters 2 and 3.

 6. How does HttpClient clean up expired connections

Before the HttpClient4.4 version, when reusing the connection from the connection pool, it will check whether it is expired, and it will be cleaned up when it expires.

The later version is different, there will be a separate thread to scan the connection in the connection pool, and it will be cleaned up when it finds that the time since the last use exceeds the set time. The default timeout is 2 seconds.

public CloseableHttpClient build() {   //如果指定了要清理过期连接与空闲连接,才会启动清理线程,默认是不启动的
   if (evictExpiredConnections || evictIdleConnections) {          //创造一个连接池的清理线程
    final IdleConnectionEvictor connectionEvictor = new IdleConnectionEvictor(cm,
      maxIdleTime > 0 ? maxIdleTime : 10, maxIdleTimeUnit != null ? maxIdleTimeUnit : TimeUnit.SECONDS,
      maxIdleTime, maxIdleTimeUnit);
    closeablesCopy.add(new Closeable() {
     @Override
     public void close() throws IOException {
      connectionEvictor.shutdown();
      try {
       connectionEvictor.awaitTermination(1L, TimeUnit.SECONDS);
      } catch (final InterruptedException interrupted) {
       Thread.currentThread().interrupt();
      }
     }

    });          //执行该清理线程
    connectionEvictor.start();
}

It can be seen that when HttpClientBuilder is building, if the cleaning function is specified, a connection pool cleaning thread will be created and run.

public IdleConnectionEvictor(
   final HttpClientConnectionManager connectionManager,
   final ThreadFactory threadFactory,
   final long sleepTime, final TimeUnit sleepTimeUnit,
   final long maxIdleTime, final TimeUnit maxIdleTimeUnit) {
  this.connectionManager = Args.notNull(connectionManager, "Connection manager");
  this.threadFactory = threadFactory != null ? threadFactory : new DefaultThreadFactory();
  this.sleepTimeMs = sleepTimeUnit != null ? sleepTimeUnit.toMillis(sleepTime) : sleepTime;
  this.maxIdleTimeMs = maxIdleTimeUnit != null ? maxIdleTimeUnit.toMillis(maxIdleTime) : maxIdleTime;
  this.thread = this.threadFactory.newThread(new Runnable() {
   @Override
   public void run() {
    try {            //死循环,线程一直执行
     while (!Thread.currentThread().isInterrupted()) {              //休息若干秒后执行,默认10秒
      Thread.sleep(sleepTimeMs);               //清理过期连接
      connectionManager.closeExpiredConnections();               //如果指定了最大空闲时间,则清理空闲连接
      if (maxIdleTimeMs > 0) {
       connectionManager.closeIdleConnections(maxIdleTimeMs, TimeUnit.MILLISECONDS);
      }
     }
    } catch (final Exception ex) {
     exception = ex;
    }

   }
  });
 }

in conclusion:

  • Only after HttpClientBuilder is manually set, will it be enabled to clean up expired and idle connections
  • After manual setting, it will start a thread to execute in an endless loop. Each time it executes sleep for a certain period of time, it will call the cleaning method of HttpClientConnectionManager to clean up expired and idle connections.

7. Summary of this article

  • The HTTP protocol alleviates the problem of too many connections in the early design by means of persistent connections
  • There are two ways of persistent connection: HTTP/1.0+'s Keep-Avlive and HTTP/1.1's default persistent connection
  • HttpClient manages persistent connections through the connection pool. The connection pool is divided into two, one is the total connection pool, and the other is the connection pool corresponding to each route.
  • HttpClient obtains a pooled connection through the asynchronous Future<CPoolEntry>
  • The default connection reuse strategy is consistent with the constraints of the HTTP protocol. According to the response, the Connection:Close is first judged to be closed, and the Connection:Keep-Alive is judged to be enabled, and the last version is greater than 1.0 to be enabled.
  • Connections in the connection pool will only be cleaned up if the switch to clean up expired and idle connections is manually enabled in HttpClientBuilder
  • Versions after HttpClient 4.4 clean up expired and idle connections through an infinite loop thread, which sleeps for a while every time it executes, to achieve the effect of regular execution

Eight, httpcleint thread pool important settings

        Registry<ConnectionSocketFactory> registry = RegistryBuilder.<ConnectionSocketFactory> create().register("http", plainsf).register("https", getSslFactory()).build();
        //创建连接池管理器
        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(registry);
        // 最大连接数
        cm.setMaxTotal(maxTotal);
        // 默认的每个路由的最大连接数(也就是相同主机最大连接数)
        cm.setDefaultMaxPerRoute(maxPerRoute);
        //        HttpHost httpHost = new HttpHost(hostname, port);
        //        // 设置到某个路由的最大连接数,会覆盖defaultMaxPerRoute
        //        cm.setMaxPerRoute(new HttpRoute(httpHost), maxRoute);

setMaxTotal and setDefaultMaxPerRoute

 

 

 

Guess you like

Origin blog.csdn.net/lan861698789/article/details/112975083