使用Jsoup刷CSDN刷博客浏览量

版权声明:如有转载,请标明出处,谢谢合作! https://blog.csdn.net/lyc_liyanchao/article/details/82724785

最近公司在搞一些抓新闻的东西,使用了Jsoup,加上本人刚开始写CSDN博客,想来测试一下,使用Jsoup是否可以通过打开CSDN的链接来增加浏览量,答案是可行的!

在抓取网页的时候,如果不使用IP代代理,有可能会被封的,所以我们需要一个IP代理池通过代理IP来进行访问。

话不多说,先上代码

  • 扒取工具类
package com.lyc.cn.ipProxy;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.springframework.util.StringUtils;

import java.net.InetSocketAddress;
import java.net.Proxy;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class JsoupUtils {

    private static Log logger = LogFactory.getLog(JsoupUtils.class);

    private static List<String> list = new ArrayList<String>();

    /**
     * 随机获取一个Ip地址
     * @return
     */
    public static String[] getRandomIp() {
        if (list.size() == 0) {
            list.add("212.77.130.65:30493");
            list.add("168.228.166.116:47059");
            list.add("54.38.202.253:54321");
            list.add("47.105.137.4:80");
            list.add("47.105.84.52:80");
            list.add("47.105.129.220:80");
            list.add("47.105.137.51:80   ");
            list.add("47.105.84.67:80");
            list.add("103.11.99.66:80");
            list.add("50.226.134.50:80");
            list.add("39.135.11.97:8080");
            list.add("111.7.130.101:8080");
            list.add("122.117.165.51:8080");
            list.add("47.105.131.35:80");
            list.add("47.105.137.135:80");
            list.add("47.105.115.176:80");
            list.add("157.65.28.91:3128");
            list.add("153.149.169.215:3128");
            list.add("140.143.105.229:80");
            list.add("117.127.0.201:8080");
        }

        Random random = new Random();
        int n = random.nextInt(list.size());
        return list.get(n).split(":");
    }

    /**
     * Jsoup打开连接地址获取Document对象
     * @param url
     * @return
     */
    public static Document getDocument(String url) {
        try {
            Connection conn = Jsoup.connect(url).ignoreContentType(true).ignoreHttpErrors(true).userAgent("Mozilla");
            // 设置代理
            String ip[] = JsoupUtils.getRandomIp();
            Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(ip[0].trim(), Integer.parseInt(ip[1].trim())));
            conn.proxy(proxy);
            // 设置超时时间并获取Document对象
            Document document = conn.timeout(8000).get();
            if (null != document && !StringUtils.isEmpty(document.toString())) {// 表示ip被拦截或者其他情况
                System.out.println(proxy.toString());
                return document;
            }
        } catch (Exception e) {
            logger.error("抓取失败...");
        }
        return null;
    }

}
  • 测试类
package com.lyc.cn.ipProxy;

import org.jsoup.nodes.Document;
import org.junit.Test;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;


public class MyTest {

    List<String> list = new ArrayList<>();

    @Test
    public void testDetail1() throws InterruptedException {
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82383245");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82383422");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82383797");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82384128");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82384247");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82384376");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82384794");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82384899");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82388043");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82388479");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82391647");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82424122");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82428726");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82432993");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82464980");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82493058");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82585752");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82591822");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82630434");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82633936");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82691306");
        list.add("https://blog.csdn.net/lyc_liyanchao/article/details/82696236");

        // 随机访问其中一篇博客
        for (int i = 0; i <= 1000; i++) {
            Random random = new Random();
            int n = random.nextInt(list.size());
            String url = list.get(n);
            Document doc = JsoupUtils.getDocument(url);
            if (null != doc) {
                System.out.println("第" + i + "次抓取,url: " + url);
            }
        }
    }

}

首先我们用List集合模拟了一个IP代理池,每次随机从中取出一个作为访问的代理IP,其次将自己想要访问的博客地址再次缓存到List中,每次从中随机取出一个,这样一来,就可以通过定义for循环的参数,来刷博客的访问量了。

本博客仅仅是为了实验,而不是真的是要去鼓励大家刷自己博客的浏览量,好的文章么,总归是有人看的。另外大家不要拿这个来刷我的浏览量,不知道会不会被封号。。。

大家也可以在自己的开发中,使用Jsoup来扒取他人的网站数据,简单易用,只要稍稍懂一些CSS和CSS选择器的规则,就可以了。

猜你喜欢

转载自blog.csdn.net/lyc_liyanchao/article/details/82724785