你以为的timeout,不一定是用户的timeout[转]

转自https://zhuanlan.zhihu.com/p/31640388

你以为的timeout,不一定是用户的timeout

Seasoninthesun

Seasoninthesun

小楼一夜听春雨

61 人赞了该文章

引言

最近在协助业务团队解决一些疑难问题,其中有一个就是有些用户反馈在进行某个特定的操作时,偶尔会遇到加载很久的情况,就好像是timeout不起作用一样,但是业务开发的同学明明将网络请求的timeout设置为30s,这是为什么呢?难道是okhttp有bug?还是说用户操作不当?

最终花费了3天时间,终于找到了问题的根因。

先说一下关键字: okio, 超时机制, 弱网,关键参数

1.确认问题

由于产品经理收集到的用户反馈比较模糊,为了准确定位问题存在,就需要拿数据说话,于是查看这个请求的埋点数据,发现确实有几十个用户在这个请求上花费的时间超过30s,有些甚至达到了90s,这样的体验就非常差了。

那会不会是业务的童鞋在初始化OkHttpClient时timeout设置错误了呢,于是查看初始化代码,如下:

OkHttpClient.Builder httpClientBuilder = new OkHttpClient.Builder()
                .readTimeout(30, TimeUnit.SECONDS)
                .connectTimeout(30, TimeUnit.SECONDS)
                .writeTimeout(30, TimeUnit.SECONDS)
                .addInterceptor(new HeaderInterceptor())

显然,三个timeout值都设置成了30s,并没有问题。这样的话只能怀疑是okhttp有bug或者我们对于okhttp的使用不当了。

2.okhttp源码中timeout调用

在创建OkHttpClient时设置的timeout,会在何时使用呢?

readTimeout,connectTimeout和writeTimeout的使用有两个地方,一个是StreamAllocation,一个是在Http2Codec中,由于我们这个请求是http 1.1协议,所以Http2Codec就不用看了。

2.1 参数传递

在StreamAllocation中的newStream()方法中,timeout的使用如下:

public HttpCodec newStream(OkHttpClient client, boolean doExtensiveHealthChecks) {
   int connectTimeout = client.connectTimeoutMillis();
   int readTimeout = client.readTimeoutMillis();
   int writeTimeout = client.writeTimeoutMillis();
   boolean connectionRetryEnabled = client.retryOnConnectionFailure();
   try {
     RealConnection resultConnection = findHealthyConnection(connectTimeout, readTimeout,
         writeTimeout, connectionRetryEnabled, doExtensiveHealthChecks);
     HttpCodec resultCodec;
     if (resultConnection.http2Connection != null) {
       resultCodec = new Http2Codec(client, this, resultConnection.http2Connection);
     } else {
       resultConnection.socket().setSoTimeout(readTimeout);
       resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
       resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
       resultCodec = new Http1Codec(
           client, this, resultConnection.source, resultConnection.sink);
     }
     synchronized (connectionPool) {
       codec = resultCodec;
       return resultCodec;
     }
   } catch (IOException e) {
     throw new RouteException(e);
   }
 }

可以看到这三个timeout都用于与连接有关的参数设置中,首先看findHealthyConnection()方法:

/**
 * Finds a connection and returns it if it is healthy. If it is unhealthy the process is repeated
 * until a healthy connection is found.
 */
private RealConnection findHealthyConnection(int connectTimeout, int readTimeout,
    int writeTimeout, boolean connectionRetryEnabled, boolean doExtensiveHealthChecks)
    throws IOException {
  while (true) {
    RealConnection candidate = findConnection(connectTimeout, readTimeout, writeTimeout,
        connectionRetryEnabled);
    // If this is a brand new connection, we can skip the extensive health checks.
    synchronized (connectionPool) {
      if (candidate.successCount == 0) {
        return candidate;
      }
    }
    // Do a (potentially slow) check to confirm that the pooled connection is still good. If it
    // isn't, take it out of the pool and start again.
    if (!candidate.isHealthy(doExtensiveHealthChecks)) {
      noNewStreams();
      continue;
    }
    return candidate;
  }
}

发现这个方法主要就是会循环调用findConnection()直到找到一个健康的连接,而findConnection()如下:

/**
  * Returns a connection to host a new stream. This prefers the existing connection if it exists,
  * then the pool, finally building a new connection.
  */
 private RealConnection findConnection(int connectTimeout, int readTimeout, int writeTimeout,
     boolean connectionRetryEnabled) throws IOException {
   Route selectedRoute;
   synchronized (connectionPool) {
     if (released) throw new IllegalStateException("released");
     if (codec != null) throw new IllegalStateException("codec != null");
     if (canceled) throw new IOException("Canceled");
     RealConnection allocatedConnection = this.connection;
     if (allocatedConnection != null && !allocatedConnection.noNewStreams) {
       return allocatedConnection;
     }
     // Attempt to get a connection from the pool.
     RealConnection pooledConnection = Internal.instance.get(connectionPool, address, this);
     if (pooledConnection != null) {
       this.connection = pooledConnection;
       return pooledConnection;
     }
     selectedRoute = route;
   }
   if (selectedRoute == null) {
     selectedRoute = routeSelector.next();
     synchronized (connectionPool) {
       route = selectedRoute;
       refusedStreamCount = 0;
     }
   }
   RealConnection newConnection = new RealConnection(selectedRoute);
   synchronized (connectionPool) {
     acquire(newConnection);
     Internal.instance.put(connectionPool, newConnection);
     this.connection = newConnection;
     if (canceled) throw new IOException("Canceled");
   }
   newConnection.connect(connectTimeout, readTimeout, writeTimeout, address.connectionSpecs(),
       connectionRetryEnabled);
   routeDatabase().connected(newConnection.route());
   return newConnection;
 }

可以发现,就是在调用RealConnection的connect()方法时用到了三个timeout,该方法如下:

public void connect(int connectTimeout, int readTimeout, int writeTimeout,
      List<ConnectionSpec> connectionSpecs, boolean connectionRetryEnabled) {
    if (protocol != null) throw new IllegalStateException("already connected");
    RouteException routeException = null;
    ConnectionSpecSelector connectionSpecSelector = new ConnectionSpecSelector(connectionSpecs);
    if (route.address().sslSocketFactory() == null) {
      if (!connectionSpecs.contains(ConnectionSpec.CLEARTEXT)) {
        throw new RouteException(new UnknownServiceException(
            "CLEARTEXT communication not enabled for client"));
      }
      String host = route.address().url().host();
      if (!Platform.get().isCleartextTrafficPermitted(host)) {
        throw new RouteException(new UnknownServiceException(
            "CLEARTEXT communication to " + host + " not permitted by network security policy"));
      }
    }
    while (protocol == null) {
      try {
        if (route.requiresTunnel()) {
          buildTunneledConnection(connectTimeout, readTimeout, writeTimeout,
              connectionSpecSelector);
        } else {
          buildConnection(connectTimeout, readTimeout, writeTimeout, connectionSpecSelector);
        }
      } catch (IOException e) {
        closeQuietly(socket);
        closeQuietly(rawSocket);
        socket = null;
        rawSocket = null;
        source = null;
        sink = null;
        handshake = null;
        protocol = null;
        if (routeException == null) {
          routeException = new RouteException(e);
        } else {
          routeException.addConnectException(e);
        }
        if (!connectionRetryEnabled || !connectionSpecSelector.connectionFailed(e)) {
          throw routeException;
        }
      }
    }
  }

不需要走代理时,调用到buildConnection()方法:

/** Does all the work necessary to build a full HTTP or HTTPS connection on a raw socket. */
  private void buildConnection(int connectTimeout, int readTimeout, int writeTimeout,
      ConnectionSpecSelector connectionSpecSelector) throws IOException {
    connectSocket(connectTimeout, readTimeout);
    establishProtocol(readTimeout, writeTimeout, connectionSpecSelector);
  }

这里就开始分开了,其中connectTimeout和readTimeout用于socket连接,而readTimeout和writeTimeout则是用于与http 2有关的设置。

2.2 connectSocket()分析

先看connectSocket()方法:

private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
    Proxy proxy = route.proxy();
    Address address = route.address();
    rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
        ? address.socketFactory().createSocket()
        : new Socket(proxy);
    rawSocket.setSoTimeout(readTimeout);
    try {
      Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
    } catch (ConnectException e) {
      ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
      ce.initCause(e);
      throw ce;
    }
    source = Okio.buffer(Okio.source(rawSocket));
    sink = Okio.buffer(Okio.sink(rawSocket));
  }

可以看到:

  • readTimeout最终被用于rawSocket.setSoTimeout(),而setSoTimeout()的作用是在建立连接之后,对于InputStream进行read()操作时的时间限制,所以这里采用readTimeout
  • connectTimeout则会最终根据不同的平台进行设置,在Android系统上最终会调用AndroidPlatform的connectSocket()方法,如下:
@Override public void connectSocket(Socket socket, InetSocketAddress address,
     int connectTimeout) throws IOException {
   try {
     socket.connect(address, connectTimeout);
   } catch (AssertionError e) {
     if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
     throw e;
   } catch (SecurityException e) {
     // Before android 4.3, socket.connect could throw a SecurityException
     // if opening a socket resulted in an EACCES error.
     IOException ioException = new IOException("Exception in connect");
     ioException.initCause(e);
     throw ioException;
   }
 }

可见这里就是为socket设置连接超时,所以是使用connectTimeout.

2.3 establishProtocol()分析

再回到RealConnection的buildConnection()方法中,在调用完connectSocket()之后,就调用了establishProtocol()方法了:

private void establishProtocol(int readTimeout, int writeTimeout,
     ConnectionSpecSelector connectionSpecSelector) throws IOException {
   if (route.address().sslSocketFactory() != null) {
     connectTls(readTimeout, writeTimeout, connectionSpecSelector);
   } else {
     protocol = Protocol.HTTP_1_1;
     socket = rawSocket;
   }
   if (protocol == Protocol.HTTP_2) {
     socket.setSoTimeout(0); // Framed connection timeouts are set per-stream.
     Http2Connection http2Connection = new Http2Connection.Builder(true)
         .socket(socket, route.address().url().host(), source, sink)
         .listener(this)
         .build();
     http2Connection.start();
     // Only assign the framed connection once the preface has been sent successfully.
     this.allocationLimit = http2Connection.maxConcurrentStreams();
     this.http2Connection = http2Connection;
   } else {
     this.allocationLimit = 1;
   }
 }

可见如果是https连接则会调用connectTls()方法:

private void connectTls(int readTimeout, int writeTimeout,
      ConnectionSpecSelector connectionSpecSelector) throws IOException {
    Address address = route.address();
    SSLSocketFactory sslSocketFactory = address.sslSocketFactory();
    boolean success = false;
    SSLSocket sslSocket = null;
    try {
      // Create the wrapper over the connected socket.
      sslSocket = (SSLSocket) sslSocketFactory.createSocket(
          rawSocket, address.url().host(), address.url().port(), true /* autoClose */);
      // Configure the socket's ciphers, TLS versions, and extensions.
      ConnectionSpec connectionSpec = connectionSpecSelector.configureSecureSocket(sslSocket);
      if (connectionSpec.supportsTlsExtensions()) {
        Platform.get().configureTlsExtensions(
            sslSocket, address.url().host(), address.protocols());
      }
      // Force handshake. This can throw!
      sslSocket.startHandshake();
      Handshake unverifiedHandshake = Handshake.get(sslSocket.getSession());
      // Verify that the socket's certificates are acceptable for the target host.
      if (!address.hostnameVerifier().verify(address.url().host(), sslSocket.getSession())) {
        X509Certificate cert = (X509Certificate) unverifiedHandshake.peerCertificates().get(0);
        throw new SSLPeerUnverifiedException("Hostname " + address.url().host() + " not verified:"
            + "\n    certificate: " + CertificatePinner.pin(cert)
            + "\n    DN: " + cert.getSubjectDN().getName()
            + "\n    subjectAltNames: " + OkHostnameVerifier.allSubjectAltNames(cert));
      }
      // Check that the certificate pinner is satisfied by the certificates presented.
      address.certificatePinner().check(address.url().host(),
          unverifiedHandshake.peerCertificates());
      // Success! Save the handshake and the ALPN protocol.
      String maybeProtocol = connectionSpec.supportsTlsExtensions()
          ? Platform.get().getSelectedProtocol(sslSocket)
          : null;
      socket = sslSocket;
      source = Okio.buffer(Okio.source(socket));
      sink = Okio.buffer(Okio.sink(socket));
      handshake = unverifiedHandshake;
      protocol = maybeProtocol != null
          ? Protocol.get(maybeProtocol)
          : Protocol.HTTP_1_1;
      success = true;
    } catch (AssertionError e) {
      if (Util.isAndroidGetsocknameError(e)) throw new IOException(e);
      throw e;
    } finally {
      if (sslSocket != null) {
        Platform.get().afterHandshake(sslSocket);
      }
      if (!success) {
        closeQuietly(sslSocket);
      }
    }
  }

在这个调用中完成了握手以及证书校验,最后可以看到socket这个成员其实是SSLSocket对象。另外,在这里其实readTimeout和writeTimeout都没有用到,这两个参数其实是没必要传递进来的。

3.socket, source, sink的超时设置

3.1 超时设置主流程梳理

再回到StreamAllocation的newStream()方法中,可以看到在findHealthyConnection()这个调用中,由于我们是http 1.1协议,所以其实我们只用到了readTimeout和connectTimeout,而并没有用到writeTimeout.

之后,就调用如下代码:

resultConnection.socket().setSoTimeout(readTimeout);
resultConnection.source.timeout().timeout(readTimeout, MILLISECONDS);
resultConnection.sink.timeout().timeout(writeTimeout, MILLISECONDS);
resultCodec = new Http1Codec(
    client, this, resultConnection.source, resultConnection.sink);

1)通过刚刚的梳理,我们发现在AndroidPlatform中给rawSocket(java.net.Socket对象)设置过readTimeout和connectTimeout,而这里的resultConnection.socket()返回的并不是rawSocket,而是socket成员,在采用https连接时它跟rawSocket是不一样的,它其实是SSLSocket对象,所以这里setSoTimeout()并不跟之前的setSoTimeout()重复。

2)source是在哪里建立的呢?其实我们刚刚分析过,就是在RealConnection的connectSocket()方法中:

private void connectSocket(int connectTimeout, int readTimeout) throws IOException {
    Proxy proxy = route.proxy();
    Address address = route.address();
    rawSocket = proxy.type() == Proxy.Type.DIRECT || proxy.type() == Proxy.Type.HTTP
        ? address.socketFactory().createSocket()
        : new Socket(proxy);
    rawSocket.setSoTimeout(readTimeout);
    try {
      Platform.get().connectSocket(rawSocket, route.socketAddress(), connectTimeout);
    } catch (ConnectException e) {
      ConnectException ce = new ConnectException("Failed to connect to " + route.socketAddress());
      ce.initCause(e);
      throw ce;
    }
    source = Okio.buffer(Okio.source(rawSocket));
    sink = Okio.buffer(Okio.sink(rawSocket));
  }

可见source其实是先获取到rawSocket的输入流,然后调用Okio.buffer()进行包装,而sink则是先获取rawSocket的输出流,然后调用Okio.buffer()进行包装。先看一下Okio.source()方法:

public static Source source(Socket socket) throws IOException {
       if(socket == null) {
           throw new IllegalArgumentException("socket == null");
       } else {
           AsyncTimeout timeout = timeout(socket);
           Source source = source((InputStream)socket.getInputStream(), (Timeout)timeout);
           return timeout.source(source);
       }
   }

可见这里其实创建了一个AsyncTimeout对象,利用这个对象来实现超时机制,那具体是如何实现的呢?请看下一小节分析。

3.2 AsyncTimeout原理

Okio中的与source()有关的timeout()方法,如下:

private static AsyncTimeout timeout(final Socket socket) {
     return new AsyncTimeout() {
         protected IOException newTimeoutException(IOException cause) {
             InterruptedIOException ioe = new SocketTimeoutException("timeout");
             if(cause != null) {
                 ioe.initCause(cause);
             }
             return ioe;
         }
         protected void timedOut() {
             try {
                 socket.close();
             } catch (Exception var2) {
                 Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var2);
             } catch (AssertionError var3) {
                 if(!Okio.isAndroidGetsocknameError(var3)) {
                     throw var3;
                 }
                 Okio.logger.log(Level.WARNING, "Failed to close timed out socket " + socket, var3);
             }
         }
     };
 }

可见这里其实就是创建了一个AsyncTimeout对象,这个对象重写了newTimeoutException()和timedout()方法,这两个方法都是定义在AsyncTimeout()中,其中前者用于在超时时抛出指定的异常,如果没有指定则抛出InterruptedIOException,而后者其实是用于在超时发生时的回调,以完成相关的业务操作(在这里就是关闭socket)。

那AsyncTimeout是如何实现超时机制的呢?会不会在这里面有bug呢?

首先找到调用链为Sink.sink()/Source.read()—>AsyncTimeout.enter()—>AsyncTimeout.scheduleTimeout(),这个scheduleTimeout()是很关键的一个方法:

private static synchronized void scheduleTimeout(
      AsyncTimeout node, long timeoutNanos, boolean hasDeadline) {
    // Start the watchdog thread and create the head node when the first timeout is scheduled.
    if (head == null) {
      head = new AsyncTimeout();
      new Watchdog().start();
    }
    long now = System.nanoTime();
    if (timeoutNanos != 0 && hasDeadline) {
      // Compute the earliest event; either timeout or deadline. Because nanoTime can wrap around,
      // Math.min() is undefined for absolute values, but meaningful for relative ones.
      node.timeoutAt = now + Math.min(timeoutNanos, node.deadlineNanoTime() - now);
    } else if (timeoutNanos != 0) {
      node.timeoutAt = now + timeoutNanos;
    } else if (hasDeadline) {
      node.timeoutAt = node.deadlineNanoTime();
    } else {
      throw new AssertionError();
    }
    // Insert the node in sorted order. 在这里进行排序
    long remainingNanos = node.remainingNanos(now);
    for (AsyncTimeout prev = head; true; prev = prev.next) {
      if (prev.next == null || remainingNanos < prev.next.remainingNanos(now)) {
        node.next = prev.next;
        prev.next = node;
        if (prev == head) {
          AsyncTimeout.class.notify(); // Wake up the watchdog when inserting at the front.
        }
        break;
      }
    }
  }

这个方法主要做了如下两件事:

  • 如果是首次创建AsyncTimeout对象时,会启动Watchdog线程
  • 所有的AsyncTimeout对象构成一个链表,这个链表是按剩余时间由短到长排列的
  • 调用notify()以唤醒等待线程

那么这个等待线程是谁呢?其实就是Watchdog,看一下它定义就知道了:

private static final class Watchdog extends Thread {
   public Watchdog() {
     super("Okio Watchdog");
     setDaemon(true);
   }
   public void run() {
     while (true) {
       try {
         AsyncTimeout timedOut = awaitTimeout();
         // Didn't find a node to interrupt. Try again.
         if (timedOut == null) continue;
         // Close the timed out node.
         timedOut.timedOut();
       } catch (InterruptedException ignored) {
       }
     }
   }
 }

而awaitTimeout()方法如下:

private static synchronized AsyncTimeout awaitTimeout() throws InterruptedException {
    // Get the next eligible node.
    AsyncTimeout node = head.next;
    // The queue is empty. Wait for something to be enqueued.
    if (node == null) {
      AsyncTimeout.class.wait();
      return null;
    }
    long waitNanos = node.remainingNanos(System.nanoTime());
    // The head of the queue hasn't timed out yet. Await that.
    if (waitNanos > 0) {
      // Waiting is made complicated by the fact that we work in nanoseconds,
      // but the API wants (millis, nanos) in two arguments.
      long waitMillis = waitNanos / 1000000L;
      waitNanos -= (waitMillis * 1000000L);
      AsyncTimeout.class.wait(waitMillis, (int) waitNanos);  //这里其实是把waitNanos一分为二,比如1000003分为1ms和3ns,其实通过waitNanos/1000000L和waitNanos%1000000L也可以实现,不过采用减法更高效
      return null;
    }
    // The head of the queue has timed out. Remove it.
    head.next = node.next;
    node.next = null;
    return node;
  }

结合上面两个方法可知,Watchdog线程有个死循环,在每次循环中会取出链表的头部节点,然后检查它是否已经超时,如果还没则陷入等待;否则就将头部节点从链表中移除,然后返回头部的下一个节点,此时由于该节点已经超时了,所以可直接调用它的timedOut()方法。

3.3 System.nanoTime()

这里需要注意的一点是System.nanoTime()与System.currentTimeMillis()方法的区别:

  • System.nanoTime()返回的是纳秒,nanoTime可能是任意时间,甚至可能是负数,因为它可能以未来某个时间点为参照。所以nanoTime的用途不是绝对时间,而是衡量一个时间段,比如说一段代码执行所用的时间,获取数据库连接所用的时间,网络访问所用的时间等。另外,nanoTime提供了纳秒级别的精度,但实际上获得的值可能没有精确到纳秒。
  • System.currentTimeMillis()返回的毫秒,这个毫秒其实就是自1970年1月1日0时起的毫秒数,Date()其实就是相当于Date(System.currentTimeMillis());因为Date类还有构造Date(long date),用来计算long秒与1970年1月1日之间的毫秒差

可见,Okio中使用System.nanoTime()来衡量时间段是一个很好的选择,既保证了足够的精度,又能保证不受系统时间的影响,因为如果采用System.currentTimeMillis()的话如果在超时等待的过程中系统时间发生变化,那么这个超时机制就可能会提前或延后,那样显然是不可靠的。

3.4 okhttp超时总结

再回到3.1节开头,它们调用的timeout()方法其实是Timeout类中的方法:

public Timeout timeout(long timeout, TimeUnit unit) {
   if (timeout < 0) throw new IllegalArgumentException("timeout < 0: " + timeout);
   if (unit == null) throw new IllegalArgumentException("unit == null");
   this.timeoutNanos = unit.toNanos(timeout);
   return this;
 }

显然,这里就是将传入的时间转化为纳秒,这个timeoutNanos在scheduleTimeout()会用到。

综合前面3个小节,可以得到如下结论:

  • Source,Sink对象的超时都是通过Timeout的子类AsyncTimeout来实现的
  • 所有的AsyncTimeout对象构成一个链表
  • 每个AsyncTimeout在会按照它的剩余时间来插入到链表中的合适位置
  • 有一个叫Watchdog的daemon线程会维护该链表,如果发现链表头部节点还没超时,则会陷入等待;否则将该节点从表中移除,并且调用它的timedout()方法,在该方法中会完成相应的操作,比如socket.close()操作

目前看来,okhttp以及okio的超时机制的实现是足够可靠和准确的,并没有发现什么bug,既然这样,那只能从其他地方入手了。

4.竟然是默认参数的锅

既然okhttp的超时机制没什么问题,那就从业务直接调用okhttp的代码入手吧,由于是调用Retrofit中Call.enqueue()方法,那就从这个方法入手吧。

看过我博客中Retrofit源码分析的同学,应该知道其实这里的Call其实是OkHttpCall对象,这个类是为了将Retrofit与okhttp进行衔接而创造的,它的enqueue()方法如下:

@Override public void enqueue(final Callback<T> callback) {
    if (callback == null) throw new NullPointerException("callback == null");
    okhttp3.Call call;
    Throwable failure;
    synchronized (this) {
      if (executed) throw new IllegalStateException("Already executed.");
      executed = true;
      call = rawCall;
      failure = creationFailure;
      if (call == null && failure == null) {
        try {
          call = rawCall = createRawCall();
        } catch (Throwable t) {
          failure = creationFailure = t;
        }
      }
    }
    if (failure != null) {
      callback.onFailure(this, failure);
      return;
    }
    if (canceled) {
      call.cancel();
    }
    call.enqueue(new okhttp3.Callback() {
      @Override public void onResponse(okhttp3.Call call, okhttp3.Response rawResponse)
          throws IOException {
        Response<T> response;
        try {
          response = parseResponse(rawResponse);
        } catch (Throwable e) {
          callFailure(e);
          return;
        }
        callSuccess(response);
      }
      @Override public void onFailure(okhttp3.Call call, IOException e) {
        try {
          callback.onFailure(OkHttpCall.this, e);
        } catch (Throwable t) {
          t.printStackTrace();
        }
      }
      private void callFailure(Throwable e) {
        try {
          callback.onFailure(OkHttpCall.this, e);
        } catch (Throwable t) {
          t.printStackTrace();
        }
      }
      private void callSuccess(Response<T> response) {
        try {
          callback.onResponse(OkHttpCall.this, response);
        } catch (Throwable t) {
          t.printStackTrace();
        }
      }
    });
  }

显然,这个方法的主要目的就是调用okhttp3.Call的enqueue()方法并且将okhttp3.Call的回调最终转换为Retrofit中的回调。而这里的call其实是okhttp3.RealCall对象(因为OkHttpCall中的createRawCall()调用serviceMethod.callFactory.newCall(),而callFactory其实就是OkHttpClient对象,OkHttpClient的newCall()方法返回的是RealCall对象),RealCall的enqueue()方法如下:

@Override public void enqueue(Callback responseCallback) {
   synchronized (this) {
     if (executed) throw new IllegalStateException("Already Executed");
     executed = true;
   }
   captureCallStackTrace();
   client.dispatcher().enqueue(new AsyncCall(responseCallback));
 }

显然,这个方法创建了一个AsyncCall对象并且调用dispatcher()这个调度器来处理:

synchronized void enqueue(AsyncCall call) {
    if (runningAsyncCalls.size() < maxRequests && runningCallsForHost(call) < maxRequestsPerHost) {
      runningAsyncCalls.add(call);
      executorService().execute(call);
    } else {
      readyAsyncCalls.add(call);
    }
  }

这个方法非常重要,因为就是在这里潜藏着用户等待时间比timeout更长的危险,注意这里的两个限制条件:

  • 第一个是当前运行的请求数必须小于maxRequests,否则就加入等待队列中。而maxRequests默认值是64
  • 第二个是runningCallsForHost(call)必须小于maxRequestsPerHost,也就是说属于当前请求的host的请求数必须小于maxRequestsPerHost,否则就先加入等待队列中。而maxRequestsPerHost默认值非常小,为5

再看一下调度器中线程池的创建:

public synchronized ExecutorService executorService() {
    if (executorService == null) {
      executorService = new ThreadPoolExecutor(0, Integer.MAX_VALUE, 60, TimeUnit.SECONDS,
          new SynchronousQueue<Runnable>(), Util.threadFactory("OkHttp Dispatcher", false));
    }
    return executorService;
  }

显然,调度用的线程池足够大,一般情况下maxRequests默认为64也足够使用了。

但是! 凡事就怕个但是!

如果是弱网环境,请求密集,并且timeout设置得比较大的情况下呢?

那么,就有可能发生如下情况:

  • 正在运行的请求数在短时间内(极端一点,比如3s内)就超过maxRequests,那么在3s之后的请求都只能先进入等待队列,然后如果网络足够差,每个连接都是等到发生超时异常后被迫关闭,那么就意味着在3s之后的请求至少要等待timeout-3s的时间,这个时间再加上它自身的timeout,那么用户的等待时间就是timeout-3s+timeout,显然这个值远大于timeout了
  • 虽然总的请求数不密集,但是恰好在某个很短的时间段内针对同一个host的请求比较密集(类似地,比如3s内),那么在3s之后针对这个host的请求也要先进入等待队列中,同样地在这之后的请求,用户至少要等待timeout-3s+timeout的时间

再结合业务中的初始化代码发现,并没有对于Dispatcher中的maxRequestsPerHost进行自定义设置,也就意味着同一时间对于每个host的请求数不能大于5,那么考虑到我分析的这个业务请求对应的host下有很多请求,那就很有可能是这个原因导致的,并且业务同学在这个地方其实也犯了一个低级错误,就是在用户点击隐藏加载框时,没有及时取消掉对应的请求,这样其实也造成了请求的浪费。

为了验证这个结论,查看了10多位发生超时远大于timeout的用户日志,发现都是在Ta们的网络环境切换到2G或者是无网,并且在某个时间段内请求密集时就会发生,说明这个结论是可靠的。

4.解决方法及使用okhttp的建议

找到了原因之后,解决办法就很简单了,这其实也是使用okhttp的一点建议:

  • 初始化okhttp时,将Dispatcher中maxRequests和maxRequestsPerHost都设置得比默认值大一些
  • 当用户点击隐藏加载框时,需要把对应的请求也及时取消掉
  • timeout尽量设置得小一些(比如10s),这样可以减小弱网环境下手机的负载,同时对于用户体验也有好处

猜你喜欢

转载自blog.csdn.net/hyb1234hi/article/details/84974980