jsoup 超时(timeout) 不起作用、timeout not worked as expected

问题

jsoup(版本1.11.2)请求数据时,超时时间设置为1分钟,但是30秒就超时了,爆出SocketTimeoutException:Read timed out。

示例代码

Connection.Response res = Jsoup.connect(url).timeout(60000).ignoreContentType(true)

在这里插入图片描述
异常栈

java.net.SocketTimeoutException: Read timed out

	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:734)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:706)
	at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:299)

保险起见,用wireshark抓了包,client端(192.168.8.12)发起请求后(366号包),server端(172.19.80.110)立刻响应了(368号包,只ACK,未携带数据),但是过了30秒后仍然未传输数据,所以client端断开链接,发送FIN报文(369号包)。
在这里插入图片描述

解决

首先,想到看java doc

/**
* Set the total request timeout duration. If a timeout occurs, an {@link java.net.SocketTimeoutException} will be thrown.
*

The default timeout is 30 seconds (30,000 millis). A timeout of zero is treated as an infinite timeout.
*

Note that this timeout specifies the combined maximum duration of the connection time and the time to read
* the full response.
* @param millis number of milliseconds (thousandths of a second) before timing out connects or reads.
* @return this Connection, for chaining
* @see #maxBodySize(int)
*/
Connection timeout(int millis);

按照javadoc的意思是,超时时间是connect 时间+read时间的总和,默认是30秒,这明显与实际不符。
根据异常栈,找到源代码


org.jsoup.helper.HttpConnection

private static HttpURLConnection createConnection(Connection.Request req) throws IOException {
            final HttpURLConnection conn = (HttpURLConnection) (
                req.proxy() == null ?
                req.url().openConnection() :
                req.url().openConnection(req.proxy())
            );

            conn.setRequestMethod(req.method().name());
            conn.setInstanceFollowRedirects(false); // don't rely on native redirection support
            conn.setConnectTimeout(req.timeout());
            conn.setReadTimeout(req.timeout() / 2); // gets reduced after connection is made and status is read

           //省略不相关代码

注意,conn.setConnectTimeout(req.timeout()); connect timeout设置成了60s,但conn.setReadTimeout(req.timeout() / 2) 是30s(60/2),正好印证了368号包与369号包相隔30秒。至此真想打包,jsoup的timeout并不完全如javadoc所说,正确的说法应该是,connect timeout是传入的timeout,read timeout是传入timeout的一半。

总结

其实这个问题,最终还是回到了基础知识:tcp的两个超时时间(httpclient connecttimeout sockettimeout区别),一个connect timeout,一个read timeout,分别对应java api中

java.net.Socket
connect timeout

connect(SocketAddress endpoint, int timeout) 
          将此套接字连接到服务器,并指定一个超时值。

连接超时,是三次握手的时间。


read timeout

setSoTimeout(int timeout) 
          启用/禁用带有指定超时值的 SO_TIMEOUT,以毫秒为单位。

read timeout是数据报文与数据报文之间的间隔时间,并不是读取全部内容的时间。


正确理解以上两个概念,有助于解决问题。

发布了336 篇原创文章 · 获赞 369 · 访问量 193万+

猜你喜欢

转载自blog.csdn.net/wangjun5159/article/details/95312749
今日推荐