java reptile commonly used settings proxy IP Tutorial

Reptiles must use a proxy IP it? Many users believe that reptiles have to use Proxy IP, IP proxy will not move an inch; also been said that the proxy IP is non-essential. So what they think are the reasons for it?
Some users write his own crawler, the company's mission a day crawling hundreds of thousands of pages, and sometimes tasks and more time on the day to millions Crawl crawl IP was blocked, no proxy IP simply will not do, he that there is no proxy ip crawlers will move an inch.
They say a lot of sense, both with experience to prove their point of view. In essence crawler it is also a user accessing the web page only, just a not-so-special unruly users, servers generally very welcome to this special user always used various means to find and disable. The most common is to determine the frequency of your visit, because the frequency of ordinary people access the web page is not soon, if an excessive ip ip access this discovery will be banned.
When the larger task than that, one day millions of data collection, slowly climbing to finish the task, and to accelerate the climb, then the target server too much pressure, will seal IP, is also able to fulfill the task. How to do it, only by proxy IP to solve. Here I am using a good quality of our long-term use of a proxy, hundred million cattle cloud agent, and the use of their reptilian agent (dynamic forwards) and general api mode is not the same. This simpler and more convenient for the lazy is definitely the best choice.
Using specific code
Import java.io.ByteArrayOutputStream;
Import a java.io.InputStream;
Import the java.net.Authenticator;
Import java.net.HttpURLConnection The;
Import java.net.InetSocketAddress;
Import java.net.PasswordAuthentication;
Import the java.net .Proxy;
import java.net.URL;
import java.util.Random;

    class ProxyAuthenticator extends Authenticator {
        private String user, password;

        public ProxyAuthenticator(String user, String password) {
            this.user     = user;
            this.password = password;
        }

        protected PasswordAuthentication getPasswordAuthentication() {
            return new PasswordAuthentication(user, password.toCharArray());
        }
    }

    /**
     * 注意:下面代码仅仅实现HTTP请求链接,每一次请求都是无状态保留的,仅仅是这次请求是更换IP的,如果下次请求的IP地址会改变
     * 如果是多线程访问的话,只要将下面的代码嵌入到你自己的业务逻辑里面,那么每次都会用新的IP进行访问,如果担心IP有重复,
     * 自己可以维护IP的使用情况,并做校验。
     */
    public class Demo {
        public static void main(String args[]) throws Exception {
            // 要访问的目标页面
            String targetUrl = "http://httpbin.org/ip";

            // 代理服务器
            String proxyServer = "t.16yun.cn";
            int proxyPort      = 31111;

            // 代理隧道验证信息
            String proxyUser  = "username";
            String proxyPass  = "password";

            try {
                URL url = new URL(targetUrl);

                Authenticator.setDefault(new ProxyAuthenticator(proxyUser, proxyPass));

                // 创建代理服务器地址对象
                InetSocketAddress addr = new InetSocketAddress(proxyServer, proxyPort);
                // 创建HTTP类型代理对象
                Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);

                // 设置通过代理访问目标页面
                HttpURLConnection connection= (HttpURLConnection) url.openConnection(proxy);
                // 设置Proxy-Tunnel
                // Random random = new Random();
                // int tunnel = random.nextInt(10000);
                //connection.setRequestProperty("Proxy-Tunnel",String.valueOf(tunnel));

                // 解析返回数据
                byte[] response = readStream(connection.getInputStream());

                System.out.println(new String(response));
            } catch (Exception e) {
                System.out.println(e.getLocalizedMessage());
            }
        }

        /**
         * 将输入流转换成字符串
         *
         * @param inStream
         * @return
         * @throws Exception
         */
        public static byte[] readStream(InputStream inStream) throws Exception {
            ByteArrayOutputStream outSteam =new ByteArrayOutputStream();
            byte[] buffer = new byte[1024];
            int len = -1;

            while ((len = inStream.read(buffer)) != -1) {
                outSteam.write(buffer, 0, len);
            }
            outSteam.close();
            inStream.close();

            return outSteam.toByteArray();
        }
    }

Guess you like

Origin blog.51cto.com/14400115/2416766