Use httpclient to return header too long

When some crawlers request a proxy website for a few days, they will continue to report an error of 400, header too long, and google has no answer, so I read the source code, the main reason is the accumulation of cookies (it can be understood that your browser has not been cleaned up for a long time) cache), the following is the troubleshooting process, and the solution is at the end of the article.
httclient request call link:
org.apache.http.impl.client.InternalHttpClient#doExecute
org.apache.http.impl.client.InternalHttpClient#setupContext
if (context.getAttribute(HttpClientContext.COOKIE_STORE) == null) {
    context.setAttribute(HttpClientContext.COOKIE_STORE, this.cookieStore);
}
 
If cookie_store is not displayed, take the member variable cookieStore of this(httpclient), and usually we only have one instance of httpClient, then cookieStore is also equivalent to a singleton.
org.apache.http.impl.execchain.RedirectExec#execute:108
org.apache.http.impl.execchain.RetryExec#execute:86
org.apache.http.impl.execchain.ProtocolExec#execute
org.apache.http.impl.execchain.MainClientExec#execute
org.apache.http.protocol.HttpRequestExecutor#execute
conn.sendRequestHeader(request);
 org.apache.http.protocol.HttpRequestExecutor#doSendRequest
org.apache.http.impl.conn.CPoolProxy#sendRequestHeader
org.apache.http.impl.io.AbstractMessageWriter#write
Sending of cookies:
for (final HeaderIterator it = message.headerIterator(); it.hasNext(); ) {
final Header header = it.nextHeader();
    this.sessionBuffer.writeLine
        (lineFormatter.formatHeader(this.lineBuf, header));
}
 
Storage of cookies:
org.apache.http.impl.execchain.ProtocolExec#execute:200
org.apache.http.HttpResponseInterceptor#process
org.apache.http.client.protocol.ResponseProcessCookies#process
org.apache.http.client.protocol.ResponseProcessCookies#processCookies:114
cookieStore.addCookie(cookie);
 
/**
 * Adds an {@link Cookie HTTP cookie}, replacing any existing equivalent cookies.
 * If the given cookie has already expired it will not be added, but existing
 * values will still be removed.
 *
 * @param cookie the {@link Cookie cookie} to be added
 *
 * @see #addCookies(Cookie[])
 *
 */
public synchronized void addCookie(final Cookie cookie) {
if (cookie != null) {
// first remove any old cookie that is equivalent
cookies.remove(cookie);
        if (!cookie.isExpired(new Date())) {
cookies.add(cookie);
}
    }
}
 
Note that when a cookie is added here, it will be removed first, and then it will be judged whether the cookie has expired. If it has not expired, it will be added. In this way, there will be no problem. What is the problem? And through debugging, we found that the name of the sessionID cookie of our third-party website will actually change! As a result, old cookies cannot be deleted, and more and more are accumulated.
Solution ①: Disable cookies
CloseableHttpClient httpClient = HttpClientBuilder.create().setConnectionManager(connManager)
                              .setRetryHandler(retryHandler).setDefaultRequestConfig(config).disableCookieManagement().build();
  The disableCookieManagement() method stops sending and receiving cookies.
/**
 * Disables state (cookie) management.
 * <p/>
* Please note this value can be overridden by the {@link #setHttpProcessor(
 * org.apache.http.protocol.HttpProcessor)} method.
 */
public final HttpClientBuilder disableCookieManagement() {
this.cookieManagementDisabled = true;
    return this;
}
 
After enabling org.apache.http.protocol.ImmutableHttpProcessor#responseInterceptors response interceptor no longer contains the ResponseProcessCookies interceptor, no longer performs the storage cookie operation, observes subsequent requests, and the header no longer contains the cookie field.
Interceptor registration code: org.apache.http.impl.client.HttpClientBuilder#build:839
if (!cookieManagementDisabled) {
    b.add(new RequestAddCookies());
}
if (!cookieManagementDisabled) {
    b.add(new ResponseProcessCookies());
}
 
Liberation plan ②: set a separate context
HttpClientContext context = HttpClientContext.create();
context.setCookieStore(new BasicCookieStore());
CloseableHttpResponse response = httpClient.execute(httpGet, context);
 
After setting, because the cookieStore of the context is not null, the member variable cookiestore of httpclient will no longer be taken by default.
The above two options can be selected according to your own situation.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326367181&siteId=291194637