Java uses jsoup to capture web page pictures

1. Overview of Jsoup

1.1. Introduction

    jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a very labor-saving API,

 Data can be retrieved and manipulated through DOM, CSS, and operations similar to jQuery.

1.2. Main functions of Jsoup

    1) Parse HTML from a URL, file or string

    2) Use DOM or CSS selectors to find and retrieve data

    3) Operable HTML elements, attributes, text

    Note: jsoup is released based on the MIT protocol and can be used in commercial projects with confidence.

Let's try to use Jsoup to capture Baidu data and pictures, and attach the code

附上maven

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.7.3</version>
</dependency>
   private static String url = "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=互联网";

    public static void main(String[] args) throws Exception {
        //链接到目标地址
        Connection connect = Jsoup.connect(url);
        //设置useragent,设置超时时间,并以get请求方式请求服务器
        Document document = connect.userAgent("Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)").timeout(6000).ignoreContentType(true).get();
        Thread.sleep(1000);
        //获取指定标签的数据
        Element elementById = document.getElementById("content_left");
        //输出文本数据
//        System.out.println(elementById.text());
        //输出html数据
//        System.out.println(elementById.html());
        //获取所有图片链接
        Elements imgtag = document.getElementsByTag("img");
        for (int i = 0; i < imgtag.size(); i++) {
            if (StringUtils.isNotEmpty(imgtag.get(i).attr("src"))&&imgtag.get(i).attr("src").startsWith("http")) {
                System.out.println(imgtag.get(i).attr("src"));
            }
        }
    }

Crawled successfully

 

The basic functions of jsoup have been introduced here, but due to the good extensibility API design of jsoup, you can develop very powerful HTML parsing functions through the definition of selectors. In addition, the development of the jsoup project itself is also very active, so if you are using Java and need to process HTML, you might as well try it.

 

Guess you like

Origin blog.csdn.net/qq_30667039/article/details/114288112