[Android Getting Started to Project Combat -- 10.1] —— Detailed tutorial of jsoup

Table of contents

1. Introduction to jsoup

Two, the use of jsoup

1. Import dependencies

2. Establish a connection

3. Get data

How to get each node

1) get text

2) Get pictures


1. Introduction to jsoup

        Jsoup is a Java html parsing tool. Use jsoup to crawl webpage data, such as pictures, text, video, music, etc. Like python crawlers, jsoup is a java crawler.

         After using jsoup to obtain the data of the web page, it can be displayed on your APP.

Two, the use of jsoup

Introduce the network application permission first, don't forget here, you can only access the webpage after you have the network.

    <uses-permission android:name="android.permission.INTERNET" />

1. Import dependencies

The build.gradle file introduces dependencies:

    implementation 'org.jsoup:jsoup:1.12.1'

2. Establish a connection

MainActivity code:

        First, establish a connection with the webpage to be crawled. Baidu’s HTML is crawled here. Note that network requests cannot be placed in the main thread, and sub-threads are required.

       button.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                new Thread(new Runnable() {
                    @Override
                    public void run() {
                        Document document = null;
                        String url = "https://www.baidu.com";
                        try {
                            document = Jsoup.connect(url).get();
                            Log.d("TTTT", "jsoup:" + document);
                        } catch (IOException exception) {
                            exception.printStackTrace();
                        }
                    }
                }).start();
            }
        });

The effect is as follows:

 View Baidu's source code:

 It can be seen that they are consistent.

3. Get data

How to get each node

        The following code can get the element with the class name "have-img"

input = document.getElementsByClass("have-img");

        The following code fetches content with an attribute named " data-note-id "

input.attr("data-note-id");

        The following code can get the content of the src attribute of the img tag under the a tag

src = input.select("a").select("img").attr("src");

        The following code gets the content of the second a tag.

content.select("a").get(1).text();

1) get text

        After establishing the connection through the previous step, we have obtained all the data of the webpage. Next, we can parse the html to obtain the data we want.

        As shown in the figure below, we get the following text.

        After selecting the tag, what we need is the fourth meta tag (starting from 0), use the get() method to get the fourth meta tag, and the attr() method can get the content with the attribute name xxx.

   public void run() {
                        Document document = null;
                        String url = "https://www.baidu.com";
                        Elements elements = null;
                        try {
                            document = Jsoup.connect(url).get();
                            elements = document.select("meta");
                            String string = elements.get(4).attr("content");
                            Log.d("TTTT", string);
                        } catch (IOException exception) {
                            exception.printStackTrace();
                        }
                    }

The effect is as follows:

2) Get pictures

The following implements getting the cover picture of a blog of my blog.

        The following code can be used to obtain the src of the image.

       public void run() {
                        Document document = null;
                        String url = "https://blog.csdn.net/Tir_zhang?type=blog";
                        Elements elements = null;
                        try {
                            document = Jsoup.connect(url).get();
                            elements = document.getElementsByClass("blog-img-box");
                            String string = elements.get(5).select("img").attr("src");
                            Log.d("TTTT", string);
                        } catch (IOException exception) {
                            exception.printStackTrace();
                        }
                    }

The effect is as follows:

After obtaining the network address, you need to load the network picture.

Use the Glide framework to load images from the web. For specific use, please refer to: The use of Glide in Android_Android glide_5239ZM's Blog-CSDN Blog

The following code can load images using Glide.

                Glide.with(MainActivity.this).load(string).into(imageView);

Final code:

        button.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                new Thread(new Runnable() {
                    @Override
                    public void run() {
                        Document document = null;
                        String url = "https://blog.csdn.net/Tir_zhang?type=blog";
                        Elements elements = null;
                        try {
                            document = Jsoup.connect(url).get();
                            elements = document.getElementsByClass("blog-img-box");
                            string = elements.get(5).select("img").attr("src");
                            Log.d("TTTT", string);
                        } catch (IOException exception) {
                            exception.printStackTrace();
                        }
                    }
                }).start();
                Glide.with(MainActivity.this).load(string).into(imageView);
            }
        });

 The effect is as follows:

 

The basic usage of using jsoup and the operation of extracting text and pictures are first introduced here. The following article will introduce how to extract videos and music.

Guess you like

Origin blog.csdn.net/Tir_zhang/article/details/130567219