Today, I found that when superword is getting word definitions, for uncommon words, the web page opens very slowly, more than 10 seconds. After inspection, it is found that when using Jsoup to grab word definitions, the set timeout of 3 seconds is invalid, and the execution of the _getContent method is invalid. The time is more than 10 seconds, the code is as follows:
public static String getContent(String url) { String html = _getContent(url); int times = 0; while(StringUtils.isNotBlank(html) && html.contains("Sorry, requests from your ip are unusually frequent")){ // use the new IP address ProxyIp.toNewIp(); html = _getContent(url); if(++times > 2){ break; } } return html; } private static String _getContent(String url) { Connection conn = Jsoup.connect(url) .header("Accept", ACCEPT) .header("Accept-Encoding", ENCODING) .header("Accept-Language", LANGUAGE) .header("Connection", CONNECTION) .header("Referer", REFERER) .header("Host", HOST) .header("User-Agent", USER_AGENT) .timeout(3000) .ignoreContentType(true); String html = ""; try { html = conn.post().html(); html = html.replaceAll("[\n\r]", ""); }catch (Exception e){ LOGGER.error("Get URL:" + url + "page error", e); } return html; }
So I thought of a way to solve this problem. The core idea is that the main thread starts a sub-thread to grab the word definition, and then the main thread sleeps for the specified timeout period. When the timeout period elapses, the grab result is obtained from the sub-thread. At this time If the sub-thread fetching has not been completed, the main thread returns an empty word definition, the code is as follows:
public static String getContent(String url) { long start = System.currentTimeMillis(); String html = _getContent(url, 1000); LOGGER.info("Time to get Pinyin: {}", TimeUtils.getTimeDes(System.currentTimeMillis()-start)); int times = 0; while(StringUtils.isNotBlank(html) && html.contains("Sorry, requests from your ip are unusually frequent")){ // use the new IP address ProxyIp.toNewIp(); html = _getContent(url); if(++times > 2){ break; } } return html; } private static String _getContent(String url, int timeout) { Future<String> future = ThreadPool.EXECUTOR_SERVICE.submit(()->_getContent(url)); try { Thread.sleep(timeout); return future.get(1, TimeUnit.NANOSECONDS); } catch (Throwable e) { LOGGER.error("Get web page exception", e); } return ""; } private static String _getContent(String url) { Connection conn = Jsoup.connect(url) .header("Accept", ACCEPT) .header("Accept-Encoding", ENCODING) .header("Accept-Language", LANGUAGE) .header("Connection", CONNECTION) .header("Referer", REFERER) .header("Host", HOST) .header("User-Agent", USER_AGENT) .timeout(1000) .ignoreContentType(true); String html = ""; try { html = conn.post().html(); html = html.replaceAll("[\n\r]", ""); }catch (Exception e){ LOGGER.error("Get URL:" + url + "page error", e); } return html; }
Detailed code address:
https://github.com/ysc/superword/commit/e4bc3c4197af95a8d7519856c89d592515a1c18f