Selenium automation tutorial and use java to crawl data

1. Introduction

Selenium is a tool set for automated testing of web applications. It can simulate users to automatically go to the browser web page to click, enter, select a drop-down value check box, mouse movement, arbitrary JavaScript execution, and so on.

Selenium has three products:

  • Selenium WebDriver : Browser-based regression automation suite and test, you can use one of these languages ​​Java, Python, JavaScript, Ruby, JavaScript, C# to write code, Selenium WebDriver will open the browser according to the code and automatically go to the web page Operate and test.
  • Selenium IDE : A plug-in in the browser developed by selenium. It is an interface-based operation without writing code. If you use Google Chrome, you can go to the Google Plugin App Store and search for Selenium IDE to install and use it. It is like a recorder. You can click, enter, jump, etc. on the web page, and it will save all your information. The operations are all recorded, and when you click to run, it will automatically perform the recorded operations. The use of Selenium IDE can refer to: Selenium IDE Tutorial
  • Selenium Grid : By running tests distributed across multiple machines, multiple environments can be managed from a central point, making it easy to run tests against a large number of browser/OS combinations.

This article is about using Selenium WebDriver to automatically operate web pages through java code. It is recommended to use Chrome browser to operate.

2. Download the browser driver

1. Obtain the driver version number to be downloaded

Please add a picture description

Check your current version in the Chrome browser, mine is 114.0.5735.134 here, discard the last digit, get 114.0.5735, and then splice it to https://chromedriver.storage.googleapis.com/LATEST_RELEASE_get a link, the link I got is as follows:

https://chromedriver.storage.googleapis.com/LATEST_RELEASE_114.0.5735

Browser access to this link will get a version number

Please add a picture description

What I got here is 114.0.5735.90, indicating that I should download the driver of 114.0.5735.90.

(For version selection, please refer to: Version Selection

2. Download the driver

According to the version number obtained above, go to the ChromeDriver download page and select the driver corresponding to the version number of Google Chrome.

Please add a picture description

Then download the corresponding driver according to your computer's operating system and decompress it (I downloaded the windos version here), and get chromedriver.exe after decompression.

Please add a picture description

Three, Maven is as follows

        <dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-java</artifactId>
            <version>4.10.0</version>
        </dependency>

4. Easy to use

The following example will automatically open a new browser window, then automatically open Baidu and automatically search for "csdn Xiliang's sadness", and then automatically click to open the first search result, which is my blog homepage, and then grab the blog homepage Article directories and links.

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.WebDriverWait;
import java.time.Duration;
import java.util.List;

import static org.openqa.selenium.support.ui.ExpectedConditions.numberOfWindowsToBe;
import static org.openqa.selenium.support.ui.ExpectedConditions.titleIs;


public class MainServer {
    
    
    public static void main(String[] args) {
    
    
        //加载 chromedriver 驱动
        System.setProperty("webdriver.chrome.driver", "D:\\Program\\chromedriver\\chromedriver.exe");
        //打开一个浏览器窗口
        WebDriver driver = new ChromeDriver();
        //打开百度链接
        driver.navigate().to("http://www.baidu.com/");
        //在搜索文本框输入"csdn 西凉的悲伤"
        driver.findElement(By.id("kw")).sendKeys("csdn 西凉的悲伤");
        //点击搜索按钮
        driver.findElement(By.id("su")).click();


        //存储当前原始窗口或页签的ID
        String originalWindow = driver.getWindowHandle();
        //获取当前打开的窗口或页签数
        int windosSize = driver.getWindowHandles().size();

        //等到百度搜索结果页面元素加载完(这里最多等5秒)
        driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(5));
        //点击第一条搜索结果,会打开新页签,也就是第2个页签
        driver.findElement(By.xpath("//*[@id='content_left']/div[@id='1']/div[@class='c-container']/div/h3/a")).click();


        WebDriverWait wait = new WebDriverWait(driver, Duration.ofMillis(10));
        //等待第2个新窗口或新页签打开
        wait.until(numberOfWindowsToBe(2));
        //循环指导找到新窗口或页签的句柄
        for (String windowHandle : driver.getWindowHandles()) {
    
    
            if(!originalWindow.contentEquals(windowHandle)) {
    
    
                //driver切换为新窗口或新页签的
                driver.switchTo().window(windowHandle);
                break;
            }
        }
        //等待新窗口或新页签的内容加载
        wait.until(titleIs("西凉的悲伤的博客_CSDN博客-java,工具,其他领域博主"));


        //读取当前页面标题
        System.out.println("当前网址的标题:"+driver.getTitle());
        //从地址栏中读取当前 URL
        System.out.println("当前网址的链接:"+driver.getCurrentUrl());
        System.out.println();


        List<WebElement> articleTitles = driver.findElements(By.xpath("//*[@class='blog-list-box-top']/h4"));
        List<WebElement> articleUrls = driver.findElements(By.xpath("//*[@class='blog-list-box']/a"));
        for (int i = 0; i < articleTitles.size(); i++) {
    
    
            String articleTitle = articleTitles .get(i).getText();
            String articleUrl = articleUrls.get(i).getAttribute("href");
            System.out.println("文章标题:"+articleTitle+" 链接:"+articleUrl);
        }
    }
}

The effect is as follows:
Please add a picture description

1. The above code uses System.setProperty to load the driver. Of course, you can also configure it in the environment variable so that you don't need to load the driver from the code. You can refer to this article to configure driver environment variables: selenium configuration using chromedriver .

2. The above code uses the implicitlyWait method to explicitly wait for the page to load, and then go to find the first search result and click it. If you search without waiting for the page to load, it will not find it and report an error. In addition to explicit waiting, there are also implicit waiting and fluent waiting. You can refer to the description on the official website: selenium Waits

3. If you don’t want to open the browser or open the browser GUI, just let the program run in the background to load the data into the memory and output the result of the memory operation, you can replace the above line 18 with the following:

        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless"); //无浏览器模式
        options.addArguments("--disable-gpu"); // 谷歌文档提到需要加上这个属性来规避bug
        WebDriver driver = new ChromeDriver(options);

5. Locator

There are many elements, buttons and texts on the webpage. For example, if you want to click the login button automatically, you need to find the login button first; if you want to click a link, you need to find the link before clicking it.

1. Locator

This thing that helps us find elements on a web page is called a locator. In the above example, an xpath locator is used, not only an xpath locator, but selenium also provides us with other locators for us to find elements, such as: by class name , through the id name, through the text displayed on the web page, through the hierarchy of html tags, etc.

Locator describe
class name Search for matching elements based on the value of class
css selector Search for matching elements based on css values
id Searches for matching elements based on the value of the id attribute
name Searches for matching elements based on the value of the name attribute
link text Searches for matching elements based on the full text displayed by the link
partial link text Searches for matching elements based on part of the text displayed by the link
tag name Search for matching elements based on html tag names
xpath Searches for matching elements based on their hierarchical position

2. Description

Take the following html as an example to illustrate the above locator.

<html>
<body>
<style>
.information {
      
      
  background-color: white;
  color: black;
  padding: 10px;
}
</style>
<h2>Contact Selenium</h2>

<form action="/action_page.php">
  <input type="radio" name="gender" value="m" />Male &nbsp;
  <input type="radio" name="gender" value="f" />Female <br>
  <br>
  <label for="fname">First name:</label><br>
  <input class="information" type="text" id="fname" name="fname" value="Jane"><br><br>
  <label for="lname">Last name:</label><br>
  <input class="information" type="text" id="lname" name="lname" value="Doe"><br><br>
  <label for="newsletter">Newsletter:</label>
  <input type="checkbox" name="newsletter" value="1" /><br><br>
  <input type="submit" value="Submit">
</form> 

<p>To know more about Selenium, visit the official page 
<a href ="www.selenium.dev">Selenium Official Page</a> 
</p>

</body>
</html>

(1) class name locator

HTML page web elements can have class attribute and we can identify these elements using class name locator available in Selenium.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.className("information"));

(2) css selector locator

CSS is a language used to style HTML pages. We can use the css selector locator strategy to identify elements on the page. If the element has an id, we create the locator as css=#id. Otherwise the format we follow is css=[attribute=value] . Let's create a locator for the name textbox using css.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.cssSelector("#fname"));

(3) id locator

We can locate it using the ID attribute available to elements in the web page. In general, ID attributes should be unique to elements on a web page.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.id("lname"));

(4) name locator

We can locate it using the NAME attribute available to elements in the web page. Usually the NAME attribute should be unique to an element on a web page.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.name("newsletter"));

(5) link text locator

If the element we want to locate is a link, we can use a link text locator to identify it on a web page. Link text is the text displayed by the link.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.linkText("Selenium Official Page"));

(6) partial link text locator

If the element we want to locate is a link, we can use the partial link text locator to identify it on the web page. Link text is the text displayed by the link. We can pass partial text as value.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.partialLinkText("Official Page"));

(7) tag locator

We can use the HTML TAG itself as a locator to identify web elements on a page. Use the tag locator to locate the "a" tag.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.tagName("a"));

(8) xpath locator

An HTML document can be seen as an XML document, and then we can use xpath to traverse the path to the element of interest to locate the element. XPath can be an absolute xpath, which is created from the root of the document. Example - /html/form/input[1]. This will return the male radio button. Or the xpath could be relative. Example: //Enter [@name='fname']. This will return the first name textbox. Let's create a locator for the female radio button using xpath.

    WebDriver driver = new ChromeDriver();
	driver.findElement(By.xpath("//input[@value='f']"));

You can refer to the article about xpath locator:
XPath in Selenium
selenium locates elements
XPath in Selenium: How to Find & Write
How to use XPath in Selenium

(9) Selenium IDE plug-in auxiliary positioning element

If you are not familiar with html or it is inconvenient to use a locator to find web page elements, you can first use the Selenium IDE plug-in mentioned in the introduction of this article to operate it, and then choose to save the project, it will save all operations into a suffix format .sidefile , there is a description of each step of operation in the file,
and the targets in each step of operation generate a variety of locators, you can directly use a locator in the targets to write directly in your code, you don’t need to analyze the webpage writing by yourself The locator positions the element.

Please add a picture description

Please add a picture description

6. Common operations

1. Open the URL link

//方便的方式
driver.get("http://www.baidu.com");
//或者长一点的方式
driver.navigate().to("http://www.baidu.com");

2. Get the title and link of the current web page

//读取当前页面标题
driver.getTitle();
//从地址栏中读取当前 URL
driver.getCurrentUrl();

3. Browser forward, back, refresh, close

//浏览器的后退
driver.navigate().back();
//浏览器的前进
driver.navigate().forward();
//浏览器的刷新
driver.navigate().refresh();
//关闭浏览器
driver.quit();

4. Warning and confirmation of pop-up window

(1) Get the text of the warning popup and click OK

//使用link text定位器找到页面链接,并点击它来出发弹窗 
driver.findElement(By.linkText("See an example alert")).click();
//等弹窗显示并获取弹窗对象
Alert alert = wait.until(ExpectedConditions.alertIsPresent());
//获取弹窗的文本内容
String text = alert.getText();
//点击弹窗的确认按钮
alert.accept();

(2) Confirmation popups are similar to alert popups, except that the user can also choose to cancel the message.
This example also shows another way to get the popup object:

//使用link text定位器找到链接,并点击它来出发弹窗 
driver.findElement(By.linkText("See a sample confirm")).click();
//等弹窗显示
wait.until(ExpectedConditions.alertIsPresent());
//获取弹窗对象
Alert alert = driver.switchTo().alert();
//获取弹窗的文本内容
String text = alert.getText();
//点击弹窗的取消按钮
alert.dismiss();

(3) The pop-up window that can be input
The prompt is similar to the confirmation pop-up window, and some text information can also be input in the pop-up window that can be input, which is similar to using form elements.

//使用link text定位器找到链接,并点击它来出发弹窗 
driver.findElement(By.linkText("See a sample prompt")).click();
//等弹窗显示并获取弹窗对象
Alert alert = wait.until(ExpectedConditions.alertIsPresent());
//在弹窗的输入框输入“你好啊”
alert.sendKeys("你好啊");
//按确定按钮
alert.accept();

7. Using cookies

1. Add cookies

public static void main(String[] args) {
    
    
        WebDriver driver = new ChromeDriver();
        try {
    
    
        	//打开网址
            driver.get("http://www.example.com");
            //添加cookie到当前浏览器网址的上下文中
            driver.manage().addCookie(new Cookie("key", "value"));
        } finally {
    
    
        	//关闭浏览器
            driver.quit();
        }
}

2. Obtain and delete cookies

(1) Get the specified cookie

public static void main(String[] args) {
    
    
       WebDriver driver = new ChromeDriver();
        try {
    
    
            driver.get("http://www.example.com");
            //设置一个Cookie
            driver.manage().addCookie(new Cookie("login", "fgflkshf&"));
            // 获取key是 'login'的Cookie
            Cookie cookie1 = driver.manage().getCookieNamed("login");
            System.out.println(cookie1);
        } finally {
    
    
            driver.quit();
        }
}

(2) Get all cookies

		Set<Cookie> cookies = driver.manage().getCookies();

(3) Delete the specified cookie

 		driver.manage().deleteCookieNamed("login");

(4) Delete all cookies

		driver.manage().deleteAllCookies();

Eight, keyboard and mouse operation

For keyboard and mouse operations, please refer to the official website instructions:

1. Keyboard Operation Instructions

2. Mouse operation instructions

3. Roller operation instructions


Reference:
Java-Selenium Automation Tutorial (Learning is not a loss)
JAVA uses selenium's common crawler operation
Selenium with Python
Selenium WebDriver

Guess you like

Origin blog.csdn.net/qq_33697094/article/details/131292916