为 httpclient 设置代理, 设置http头

来自:深沉的船

在学校园网,要代理才可上网。写程序获取网上资源时,怎么办呢?同样可以为程序设置代理。我用HttpClient的3.1版本。有些网站要浏览器才可以访问,但程序可以仿浏览器,主要是设置http头。

在HostConfiguration.setProxy里设置。如下面代码:

package com.skydong;   
       
    import java.io.IOException;   
    import java.util.ArrayList;   
    import java.util.List;   
      
   import org.apache.commons.httpclient.DefaultHttpMethodRetryHandler;   
   import org.apache.commons.httpclient.Header;   
   import org.apache.commons.httpclient.HttpClient;   
   import org.apache.commons.httpclient.HttpException;   
   import org.apache.commons.httpclient.HttpStatus;   
   import org.apache.commons.httpclient.UsernamePasswordCredentials;   
   import org.apache.commons.httpclient.auth.AuthScope;   
   import org.apache.commons.httpclient.methods.GetMethod;   
   import org.apache.commons.httpclient.params.HttpMethodParams;   
      
   public class HttpClientUse {   
      
       public static void main(String[] args) throws HttpException, IOException {   
           HttpClient httpClient = new HttpClient();   
      
           httpClient.getHostConfiguration().setProxy("localhost", 808);   
      
           /*//需要验证  
           UsernamePasswordCredentials creds = new UsernamePasswordCredentials("chenlb", "123456");  
     
           httpClient.getState().setProxyCredentials(AuthScope.ANY, creds);  
           */   
     
           //设置http头   
           List<Header> headers = new ArrayList<Header>();   
           headers.add(new Header("User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"));   
           httpClient.getHostConfiguration().getParams().setParameter("http.default-headers", headers);   
      
           GetMethod method = new GetMethod("http://www.baidu.com");   
           method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,   
                   new DefaultHttpMethodRetryHandler(3, false));   
           try {   
               int statusCode = httpClient.executeMethod(method);   
      
               if (statusCode != HttpStatus.SC_OK) {   
                   System.out.println("Method failed code="+statusCode+": " + method.getStatusLine());   
      
               } else {   
                   System.out.println(new String(method.getResponseBody(), "gb2312"));   
               }   
           } finally {   
              method.releaseConnection();   
           }   
       }   
   }  

 如果要用户名与密码验证的,请把/* */注释去掉。使验证有效。


验证,关键是:

UsernamePasswordCredentials creds = new UsernamePasswordCredentials("chenlb", "123456");   
   httpClient.getState().setProxyCredentials(AuthScope.ANY, creds); 

 设置http请求头.

 List<Header> headers = new ArrayList<Header>();   
 headers.add(new Header("User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"));   
httpClient.getHostConfiguration().getParams().setParameter("http.default-headers", headers);  
 

其实可以用这种代理的方法来爬取有防爬功能的网站。

具体操作:点击这里

猜你喜欢

转载自liuxinglanyue.iteye.com/blog/833720