Java crawler cracks slider verification code

Technology used: java+Selenium

nonsense:

        If there are crawlers, there will naturally be anti-crawlers, just like viruses and anti-virus software, where there is attack, there is defense, and the two promote each other. At present, the most popular anti-crawling technology verification code, in order to prevent crawlers from automatically registering and generating spam accounts in batches, almost all registration pages of websites will use verification code technology. In fact, the English of the verification code is CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), translated into Chinese is a fully automatic public Turing test to distinguish between computers and humans. It is a test that can distinguish whether the user is a computer or a human. , the user can be considered human as long as they pass the CAPTCHA test. From this we can also know that the key to cracking the slider verification code is to make the computer better simulate human behavior


Cracking the No-Notch Slider

The unnotched slider is shown below:

 

 Slider code:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
    <meta http-equiv="Pragma" content="no-cache">
    <meta http-equiv="Expires" content="0">
    <meta http-equiv="X-UA-Compatible" content="IE-Edge,chrome=1">
    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=no">
    <meta content="yes" name="apple-mobile-web-app-capable">
    <meta content="black" name="apple-mobile-web-app-status-bar-style">
    <meta content="telephone=no" name="format-detection">
    <meta content="email=no" name="format-detection">
    <title>拖动滑块验证</title>
    <meta name="description" content="">
    <meta name="keywords" content="">
    <link rel="stylesheet" type="text/css" href="">
    <style>
        * {
            margin: 0;
            padding: 0;
        }

        body {
            font: 12px/1.125 Microsoft YaHei;
            background: #fff;
        }

        ul, li {
            list-style: none;
        }

        a {
            text-decoration: none;
        }

        .ani {
            transition: all .3s;
        }

        .wrap {
            width: 300px;
            height: 350px;
            text-align: center;
            margin: 150px auto;
        }

        .inner {
            padding: 15px;
        }

        .clearfix {
            overflow: hidden;
            _zoom: 1;
        }

        .none {
            display: none;
        }

        #slider {
            position: relative;
            background-color: #e8e8e8;
            width: 300px;
            height: 34px;
            line-height: 34px;
            text-align: center;
        }

        #slider .handler {
            position: absolute;
            top: 0px;
            left: 0px;
            width: 40px;
            height: 32px;
            border: 1px solid #ccc;
            cursor: move;
        }

        .handler_bg {
            background: #fff url("") no-repeat center;
        }

        .handler_ok_bg {
            background: #fff url("") no-repeat center;
        }

        #slider .drag_bg {
            background-color: #7ac23c;
            height: 34px;
            width: 0px;
        }

        #slider .drag_text {
            position: absolute;
            top: 0px;
            width: 300px;
            -moz-user-select: none;
            -webkit-user-select: none;
            user-select: none;
            -o-user-select: none;
            -ms-user-select: none;
        }

        .unselect {
            -moz-user-select: none;
            -webkit-user-select: none;
            -ms-user-select: none;
        }

        .slide_ok {
            color: #fff;
        }
    </style>
</head>
<body>
<div class="wrap">
    <div id="slider">
        <div class="drag_bg"></div>
        <div class="drag_text" onselectstart="return false;" unselectable="on">拖动滑块验证</div>
        <div class="handler handler_bg"></div>
    </div>
</div>

<script>
    (function (window, document, undefined) {
        var dog = {//声明一个命名空间,或者称为对象
            $: function (id) {
                return document.querySelector(id);
            },
            on: function (el, type, handler) {
                el.addEventListener(type, handler, false);
            },
            off: function (el, type, handler) {
                el.removeEventListener(type, handler, false);
            }
        };

//封装一个滑块类
        function Slider() {
            var args = arguments[0];
            for (var i in args) {
                this[i] = args[i]; //一种快捷的初始化配置
            }
//直接进行函数初始化,表示生成实例对象就会执行初始化
            this.init();
        }

        Slider.prototype = {
            constructor: Slider,
            init: function () {
                this.getDom();
                this.dragBar(this.handler);
            },
            getDom: function () {
                this.slider = dog.$('#' + this.id);
                this.handler = dog.$('.handler');
                this.bg = dog.$('.drag_bg');
            },
            dragBar: function (handler) {
                var that = this,
                    startX = 0,
                    lastX = 0,
                    doc = document,
                    width = this.slider.offsetWidth,
                    max = width - handler.offsetWidth,
                    drag = {
                        down: function (e) {
                            var e = e || window.event;
                            that.slider.classList.add('unselect');
                            startX = e.clientX - handler.offsetLeft;
                            console.log('startX: ' + startX + ' px');
                            dog.on(doc, 'mousemove', drag.move);
                            dog.on(doc, 'mouseup', drag.up);
                            return false;
                        },
                        move: function (e) {
                            var e = e || window.event;
                            lastX = e.clientX - startX;
                            lastX = Math.max(0, Math.min(max, lastX)); //这一步表示距离大于0小于max,巧妙写法
                            console.log('lastX: ' + lastX + ' px');
                            if (lastX >= max) {
                                handler.classList.add('handler_ok_bg');
                                that.slider.classList.add('slide_ok');
                                dog.off(handler, 'mousedown', drag.down);
                                drag.up();
                            }
                            that.bg.style.width = lastX + 'px';
                            handler.style.left = lastX + 'px';
                        },
                        up: function (e) {
                            var e = e || window.event;
                            that.slider.classList.remove('unselect');
                            if (lastX < width) {
                                that.bg.classList.add('ani');
                                handler.classList.add('ani');
                                that.bg.style.width = 0;
                                handler.style.left = 0;
                                setTimeout(function () {
                                    that.bg.classList.remove('ani');
                                    handler.classList.remove('ani');
                                }, 300);
                            }
                            dog.off(doc, 'mousemove', drag.move);
                            dog.off(doc, 'mouseup', drag.up);
                        }
                    };
                dog.on(handler, 'mousedown', drag.down);
            }
        };
        window.S = window.Slider = Slider;
    })(window, document);
    var defaults = {
        id: 'slider'
    };
    new S(defaults);
</script>
</body>
</html>

analyze

1. Check the slider button size

 2. View slider size

 From the above two pictures, the dragging distance is (300-40)px

crawler code

public static void main(String[] args) throws Exception {
    System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
    WebDriver driver = new ChromeDriver();
    try {
        driver.get("file:///C:/Users/Administrator/Desktop/index.html");
        WebElement Slider = driver.findElement(By.cssSelector(".handler.handler_bg"));// 拿到滑块按钮
        Thread.sleep(2000L);
        // 实例化鼠标操作对象Actions
        Actions action = new Actions(driver);
        action.dragAndDropBy(Slider,260,0).perform();// 移动一定位置
        
        Thread.sleep(5000L);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }finally{
        // driver.close();// 关闭页面
        driver.quit();// 释放资源
    }
}

Note: Some websites may succeed in verification after dragging, and some may fail. Don’t panic if the children’s shoes fail, because the website detects that you are using a crawler to operate. I have a clever plan! Then look down!

First divide and analyze a wave! 1. Use the driver to open the browser

public static void openChrome(){
   System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
   // 1.打开Chrome浏览器
   chromeDriver = new ChromeDriver();
   chromeDriver.get("url...");
}

2. Then f12 opens the console console and enters: window.navigator.webdriver

 It is found that the value is true, but when we open the browser manually normally, it is false or undefined, as shown in the figure below

 

So it is concluded that the website obtains this parameter through the code, the return value undefined or false is a normal browser, and returning true indicates that the Selenium simulated browser is used, so the solution still needs to be solved from the driver browser, to hide it before starting Chromedriver

public static void openChrome(){
   // 隐藏 window.navigator.webdriver
   ChromeOptions option = new ChromeOptions();
   option.setExperimentalOption("useAutomationExtension", false);
   option.setExperimentalOption("excludeSwitches", Lists.newArrayList("enable-automation"));
   option.addArguments("--disable-blink-features=AutomationControlled");//主要是这句是关键
   
   System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
   // 1.打开Chrome浏览器
   chromeDriver = new ChromeDriver(option);
   chromeDriver.get("URL...");
}

Then start the view again and it becomes false


Hack the Notch Slider

The notch slider is as shown below:

 analyze

I analyzed the slider source code of a certain website. As shown in the figure below, it can be seen that the gap slider diagram is drawn by canvas.

 1. What we need to do is to find the X coordinate of the gap, so we need to get the complete picture and the gap picture for calculation, but we can only see one gap picture, but we only need to add a line of code style="display: none"

 Then look again and there will be a gap picture without puzzle blocks

 

 2. Then modify the style="display:block" in the canvas below to see the complete picture as shown below

Then look again and see the complete picture

 

 3. Then use selenium's screenshot method to save the original image and the gap image, and then compare the pixels to calculate the button position and the X coordinate of the gap


crawler code

public class ElementLocate {
   private static ChromeDriver chromeDriver;

   public static void main(String[] args) throws InterruptedException, IOException {
      openChrome();// 打开浏览器等操作
      try {
         chromeDriver.manage().window().maximize();// 浏览器最大化
         // 等待滑块加载完毕
         new WebDriverWait(chromeDriver, 5)
               .until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//div[@aria-label='点击按钮进行验证']")));
         // 点开滑块
         chromeDriver.findElementByXPath("//div[@aria-label='点击按钮进行验证']").click();// 点开验证框
         operateSlider();// 操作滑块
      } finally {
         chromeDriver.quit();//测试完要停止 不然卡成球
      }
   }

   private static void openChrome() {
      // 配置浏览器
      ChromeOptions option = new ChromeOptions();
      option.setExperimentalOption("useAutomationExtension", false);
      option.setExperimentalOption("excludeSwitches", Lists.newArrayList("enable-automation"));
      option.addArguments("--disable-blink-features=AutomationControlled");//主要是这句是关键,防止网站js检测出爬虫
      // set浏览器驱动
      System.setProperty("webdriver.chrome.driver", "D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");
      // 打开Chrome浏览器
      chromeDriver = new ChromeDriver(option);
      // 访问百度
      chromeDriver.get("https://account.zbj.com/login?lgtype=1&waytype=603&fromurl=https%3A%2F%2Fxiamen.zbj.com%2F");
   }

   // 操作元素属性
   private static void setAttribute(WebDriver driver, WebElement element, String attributeName, String value) {
      JavascriptExecutor js = (JavascriptExecutor) driver;
      js.executeScript("arguments[0].setAttribute('" + attributeName + "', '" + value + "')", element);
   }

   //删除元素属性
   private void removeAttribute(WebDriver driver, WebElement element, String attributeName) {
      JavascriptExecutor js = (JavascriptExecutor) driver;
      js.executeScript("argument[0].removeAttribute(argumentp[1]),argument[2]", element, attributeName);
   }

   // 截图
   private static File captureElement(File screenshot, WebElement element) {
      try {
         BufferedImage img = ImageIO.read(screenshot);
         int width = element.getSize().getWidth();
         int height = element.getSize().getHeight();
         //获取指定元素的坐标
         Point point = element.getLocation();
         //从元素左上角坐标开始,按照元素的高宽对img进行裁剪为符合需要的图片
         BufferedImage dest = img.getSubimage(point.getX(), point.getY(), width, height);
         ImageIO.write(dest, "png", screenshot);
      } catch (IOException e) {
         e.printStackTrace();
      }
      return screenshot;
   }

   // 操作滑块
   private static void operateSlider() throws InterruptedException, IOException {
      Thread.sleep(1000);// 重复获取元素必须sleep,否则会报错!

      //修改元素属性,显示缺口滑块图,这里需要等图片加载出来,如果网络慢没加载出来会报错
      WebElement que1 = chromeDriver.findElementByXPath("//div[@class='geetest_slicebg geetest_absolute']/canvas[@class='geetest_canvas_slice geetest_absolute']");
      setAttribute(chromeDriver, que1, "style", "display:none");
      // 截图滑块缺口图片
      WebElement quekou = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_bg geetest_absolute']");
      File src = chromeDriver.getScreenshotAs(OutputType.FILE);
      FileUtils.copyFile(src, new File("D:\\result.png"));
      FileUtils.copyFile(captureElement(src, quekou), new File("D:\\test.png"));

      // 修改元素属性,显示完整滑块图
      WebElement que2 = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_fullbg geetest_fade geetest_absolute']");
      setAttribute(chromeDriver, que2, "style", "display:block");
      // 截图滑块完整图
      WebElement wanzheng = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_bg geetest_absolute']");
      File src2 = chromeDriver.getScreenshotAs(OutputType.FILE);
      FileUtils.copyFile(src2, new File("D:\\result1.png"));
      FileUtils.copyFile(captureElement(src2, wanzheng), new File("D:\\test1.png"));

      // 还原滑块
      WebElement huanyuan1 = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_fullbg geetest_fade geetest_absolute']");
      setAttribute(chromeDriver, huanyuan1, "style", "display:none");
      WebElement huanyuan2 = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_slice geetest_absolute']");
      setAttribute(chromeDriver, huanyuan2, "style", "display:block");

      // 计算缺口滑块图和完整滑块图者差距,5为滑块按钮和滑块图左边的差5px
      int moveDistance = getMoveDistance() - 5;
      // 拿到滑块按钮
      WebElement btn = chromeDriver.findElementByXPath("//div[@class='geetest_slider_button']");
      // 拿到鼠标操作,实例化Actions
      Actions actions = new Actions(chromeDriver);

      // 把滑块->缺口距离分成多份
      int[] nums = split(moveDistance);

      // 移动滑块按钮
      Random random = new Random();
      String time = "35";
      for (int i = 0; i < nums.length; i++) {
         actions.clickAndHold(btn).moveByOffset(nums[i], 0)
               .build().perform();
         int times = Integer.parseInt(time + random.nextInt(10));
         Thread.sleep(times);
      }
      // 模拟人操作
      actions.clickAndHold(btn).moveByOffset(-1, 0).release()
            .build().perform();

      Thread.sleep(3000);// 滑块完成等待2秒判断是否验证成功

      // 是否滑块成功
      String attribute = chromeDriver.findElementByXPath("//div[@class='geetest_radar_tip']").getAttribute("aria-label");
      System.out.println("attribute = " + attribute);
      if (attribute.equals("网络不给力") ) {
         chromeDriver.findElementByXPath("//div[@class='geetest_radar_tip']").click();
         // 再次滑块
         operateSlider();
      }
   }

   // 整数拆分
   private static int[] split(int num) {
      int[] nums = new int[5];
      Random rand = new Random();
      for (int i = 0; i < nums.length - 1; i++) {
         nums[i] = rand.nextInt(num);
         num -= nums[i];
      }
      nums[nums.length - 1] = num;
      return nums;
   }

}

Note: Slide the slider button to the designated area, the slider may be eaten! This is because it is judged as a machine operation, so it is necessary to simulate the human speed to slide a certain distance and stop for n milliseconds. After my continuous debugging, this can reduce the chance of being misjudged. The success rate is about 80%.

 This is a small demo that Xiaobian is developing, learning, using and summarizing. There may be deficiencies in this process, and I hope to get your understanding and suggestions. If there is any infringement, please contact the editor!

Guess you like

Origin blog.csdn.net/weixin_46522803/article/details/127900539