Csdn crawler automatic comment

Foreword : Because I found my csdn blog was automatically commented by robots, many of these blogs are commenting on others, and then others may return visits or like and follow, etc. Basically, the total points are very high. In order to cover these machine comments, This article is mainly to realize that the java crawler can automatically comment on all the articles of one's own blog.

1. Preparing for analysis

Tool: webmagic

Material: Automatic random commentary loaded from the file


1. Created an automatically randomly generated comment language, and a class CommentLoad that can automatically load the comment language.

/**
 * 评论语加载
 */
public class CommentLoad {

	private AtomicBoolean inited = new AtomicBoolean(false);

	private List<String> urls = new ArrayList<>();

	// 默认刷新时间20秒
	private static final long DEFAULT_REFRESH_TIME = 20000l;
	private static final String DEFAULT_PATH = "comment.txt";

	// 起始时间
	private long beforeTime;

	// 结束时间
	private long endTime;

	// 刷新时间
	private long refreshTime = DEFAULT_REFRESH_TIME;

	public CommentLoad() {
	}

	public CommentLoad(long refreshTime) {
		super();
		this.refreshTime = refreshTime;
	}

	public static void main(String[] args) throws InterruptedException, IOException {

		// 功能1):从文件加载评论语列表
		String path = DEFAULT_PATH;
		CommentLoad commentLoad = new CommentLoad();
		int i = 0;
		while (true) {
			Thread.sleep(1000l);
			List<String> list = commentLoad.loadComments(path);
			System.out.println("计数时间:" + ++i);
			System.out.println(list.size());
			System.out.println(list);
		}

		// 功能2): 创建多个评论语到文件中
		// path =
		// CommentLoad.class.getClassLoader().getResource(path).getPath();
		//
		// System.out.println(path);
		//
		// // 写评论到评论文件中
		// PrintWriter printWriter = new PrintWriter(new FileWriter(path,
		// false));
		// String[] str = new String[] { "文章", "很好", "思路清晰,", "大佬", "66", "加油",
		// "学习了", "你真棒!" };
		// for (int i = 0; i < 50; i++) {
		// // System.out.println(flushArrToString(str));
		// printWriter.println(flushArrToString(str));
		// printWriter.flush();
		// }
		// printWriter.close();
	}

	/**
	 * 随机洗牌
	 */
	public static <T> String flushArrToString(T[] arr) {
		int length = arr.length;
		int index = length - 1;
		for (int i = 0; i < length && index > 0; i++) {
			int num = createRandom(index);
			T temp = arr[num];
			arr[num] = arr[index];
			arr[index] = temp;
			index--;
		}
		StringBuilder builder = new StringBuilder();
		for (T t : arr) {
			builder.append(t.toString());
		}
		return builder.toString();
	}

	public static int createRandom(int end) {
		return (new Random().nextInt(end));
	}

	/**
	 * 读取评论文本
	 */
	public List<String> loadComments(String path) {
		path = path == null ? DEFAULT_PATH : path;
		if (!inited.get() || System.currentTimeMillis() > this.endTime) {
			readComments(path);
		}
		return urls;
	}

	/**
	 * 读取评论文本
	 */
	private synchronized void readComments(String path) {
		if (!inited.get() || System.currentTimeMillis() > this.endTime) {
			try {
				urls = doReadComments("comment.txt");
			} catch (IOException e) {
				e.printStackTrace();
			}
			this.beforeTime = System.currentTimeMillis();
			this.endTime = beforeTime + this.refreshTime;
			inited.set(true);
		}
	}

	/**
	 * 读取评论文本
	 */
	private List<String> doReadComments(String path) throws FileNotFoundException, IOException {
		String res = CommentLoad.class.getClassLoader().getResource(path).getPath();
		List<String> comments = new ArrayList<>();
		BufferedReader reader = null;
		try {
			reader = new BufferedReader(new FileReader(res));
			String line;
			while ((line = reader.readLine()) != null) {
				comments.add(line.trim());
			}
		} finally {
			if (reader != null) {
				IOUtils.closeQuietly(reader);
			}
		}
		return comments;
	}

}

The main function of this class is to load comments from the specified file path to the list list

 

2. With the commentary, first conduct a comment test

After testing, commenting articles need to know the article id and login status to comment.

	String content = "这个文章非常好啊";  // 评论内容
		String articleId = "109261723"; // 评论文章id
		Request request = new Request("https://blog.csdn.net/phoenix/web/v1/comment/submit");
		
		request.setMethod(HttpConstant.Method.POST);
		Map<String, Object> params = new HashMap<>();
		params.put("commentId", "");
		params.put("content", content);
		params.put("articleId", articleId);
		HttpRequestBody form = HttpRequestBody.form(params , "utf-8");
		request.setRequestBody(form);
		Spider.create(new ComentTest()).addRequest(request).thread(1).run(); // 需要设置登陆cookie

 

3. With a comment test, the comments of multiple articles are mainly where to collect the articles that need to be commented.

For example, it can be obtained from the list of recently published blogs. In this batch of comments, a single blogger’s article list is used for all comments. The single blogger’s article collection list is from https://blog.csdn.net/username/article /list/ Paging, start.

/**
	 * 自动评论---单个博主
	 */
	public static void main(String[] args) {

		String user = "shuixiou1"; // csdn用户
		int page = 3; // 此用户的文章分页数目

		String[] alls = createInitUrls(user, page);
		
		Spider.create(new CsdnConmentSpider()).addUrl(alls).thread(1).run();
	}

	/**
	 * 创建初始时的url集合
	 */
	private static String[] createInitUrls(String user, int page) {
		List<String> urls = new ArrayList<>();
		for (int i = 1; i <= page; i++) {
			urls.add(String.format(listUrl, user) + i);
		}
		String[] result = urls.toArray(new String[urls.size()]);
		return result;
	}

Second, the complete code

1. Code

package com.pc.demos.csdn;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;

import org.jsoup.Jsoup;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.pc.util.CookieUtil;

import us.codecraft.webmagic.Page;
import us.codecraft.webmagic.Request;
import us.codecraft.webmagic.Site;
import us.codecraft.webmagic.Spider;
import us.codecraft.webmagic.model.HttpRequestBody;
import us.codecraft.webmagic.processor.PageProcessor;
import us.codecraft.webmagic.utils.HttpConstant;

/**
 * csdn单个博主自动评论所有文章
 */
public class CsdnConmentSpider implements PageProcessor {
	
	Logger logger = LoggerFactory.getLogger(getClass());

    // 列表url
	private static final String listUrl = "https://blog.csdn.net/%s/article/list/";
    
    // 列表url规则
	private static final String listUrlRegex = "https://blog\\.csdn\\.net/(.+)/article/list/(.*)";
	
    // 详细url规则
	private static final String detailUrlRegex = "https://blog\\.csdn\\.net/(.+)/article/details/(.*)";

    // 评论语加载对象	
	private CommentLoad commentLoad = new CommentLoad();

	@Override
	public void process(Page page) {
		// 列表页请求
		if(page.getRequest().getUrl().matches(listUrlRegex)) {
			List<String> list = page.getHtml().xpath("//div[@class='article-item-box csdn-tracking-statistics']/h4/a").all();
			for (String string : list) {
				String link = Jsoup.parse(string).select("a").attr("href");
				page.addTargetRequest(link);
			}
		// 详细页请求
		} else if(page.getRequest().getUrl().matches(detailUrlRegex)){
			
			System.out.println("详情页面加载:" + page.getRequest().getUrl());
			
			// 文章id
			String articleId = page.getRequest().getUrl().substring(page.getRequest().getUrl().lastIndexOf("/") + 1,
					page.getRequest().getUrl().length());
			
			Request request = new Request("https://blog.csdn.net/phoenix/web/v1/comment/submit");
			
			request.setMethod(HttpConstant.Method.POST);
			Map<String, Object> params = new HashMap<>();
			List<String> comments = commentLoad.loadComments(null); 
			
			params.put("commentId", "");
			params.put("content", comments.get(new Random().nextInt(comments.size())));
			params.put("articleId", articleId);
			HttpRequestBody form = HttpRequestBody.form(params , "utf-8");
			request.setRequestBody(form);
			Map<String, Object> extras = new HashMap<>();
			extras.put("articleId", articleId);
			request.setExtras(extras);
			page.addTargetRequest(request);
		// 评论请求
		} else {
			String res = page.getJson().jsonPath("$..data").toString();
			System.out.println("评论成功:返回id是" + res);
		}
	}

	@Override
	public Site getSite() {
		Site site = Site.me().setCycleRetryTimes(3).setSleepTime(2000);
		site.addHeader(":authority", "blog.csdn.net");
		site.addHeader(":method:", "POST");
		site.addHeader(":path:", "/phoenix/web/v1/comment/submit");
		site.addHeader(":scheme", "https");
		site.addHeader("accept", "application/json, text/javascript, */*; q=0.01");

		site.addHeader("accept-encoding", "gzip, deflate, br");
		site.addHeader("accept-language", "zh-CN,zh;q=0.9");
		site.addHeader("origin", "https://blog.csdn.net");
		site.addHeader("referer", "https://blog.csdn.net");
		
		// 设置登陆后的cookie字符串
		
		String cookieSpec = "################";
		
		CookieUtil.setSiteCookies(site, cookieSpec );
		
		return site;
	}

	/**
	 * 自动评论---单个博主
	 */
	public static void main(String[] args) {

		String user = "shuixiou1"; // csdn用户
		int page = 3; // 此用户的文章分页数目

		String[] alls = createInitUrls(user, page);
		
		Spider.create(new CsdnConmentSpider()).addUrl(alls).thread(1).run();
	}

	/**
	 * 创建初始时的url集合
	 */
	private static String[] createInitUrls(String user, int page) {
		List<String> urls = new ArrayList<>();
		for (int i = 1; i <= page; i++) {
			urls.add(String.format(listUrl, user) + i);
		}
		String[] result = urls.toArray(new String[urls.size()]);
		return result;
	}
}

2. Instructions for use

After a round of testing, it is not limited by frequency

1) The login cookie string must be set (the code has been replaced with ######################)

2) Need to take care to rewrite the name of the csdn blogger! ! ! !

 

 

 

 

 

Guess you like

Origin blog.csdn.net/shuixiou1/article/details/114371765