Retrieving data from nested href from jsoup

Emil :

I would like retrieving data from nested href from jsoup, i mean: i have href: https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999

and i want to take each data from this 10 fighers, e.g.:

1. STIPE MIOCIC AGE: 37 or ASSOCIATION: STRONG STYLE FIGHT TEAM

2. DANIEL CORMIER AGE: 40 or ASSOCIATION: AMERICAN KICKBOXING ACADEMY

etc..

How to do this?

    String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999";
    Document document = Jsoup.connect(url).get();

    Elements allH1 = document.select("h2");
    for (Element href : allH1) {

        Elements allAge = document.select("div.birth_info");
        for (Element  age : allAge) {
            System.out.println(href.select("a[href]").text().toString());
            System.out.println(age.select() // something there?);
        }
TDG :

The data you are looking for is present on seperate pages - each fighter has his own page, so you must crawl all the pages one by one to get the data.
First you have to get the link for each page, with the selector h2 > a[href]:

String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999";
Document document = Jsoup.connect(url).get();
Elements fighters = document.select("h2 > a[href]");
for (Element fighter : fighters) {
     System.out.println(fighter.text() + " " + fighter.attr("href"));
}

After that, you can load each page and extract the data:

String fighterUrl = "https://www.sherdog.com" + fighter.attr("href"); 
Document doc = Jsoup.connect(fighterUrl).get();
Element fighterData = doc.select("div.data").first();
System.out.println(fighterData.text());

Combined together, you get:

String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999";
Document document = Jsoup.connect(url).get();
Elements fighters = document.select("h2 > a[href]");
for (Element fighter : fighters) {
    System.out.println(fighter.text());
    String fighterUrl = "https://www.sherdog.com" + fighter.attr("href"); 
    Document doc = Jsoup.connect(fighterUrl).get();
    Element fighterData = doc.select("div.data").first();
    System.out.println(fighterData.text());
    System.out.println("---------------");
}

And the (partial) output is:

Stipe Miocic Born: 1982-08-19 AGE: 37 Independence, Ohio United States Height 6'4" 193.04 cm Weight 245 lbs 111.13 kg Association: Strong Style Fight Team Class: Heavyweight Wins 19 15 KO/TKO (79%) 0 SUBMISSIONS (0%) 4 DECISIONS (21%) Losses 3 2 KO/TKO (67%) 0 SUBMISSIONS (0%) 1 DECISIONS (33%)

Daniel Cormier Born: 1979-03-20 AGE: 40 San Jose, California United States Height 5'11" 180.34 cm Weight 251 lbs 113.85 kg Association: American Kickboxing Academy Class: Heavyweight Wins 22 10 KO/TKO (45%) 5 SUBMISSIONS (23%) 7 DECISIONS (32%) Losses 2 1 KO/TKO (50%) 0 SUBMISSIONS (0%) 1 DECISIONS (50%) N/C 1

If you want to get the age, association and so as seperate fields, you'll have to extract them with regex.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=357697&siteId=1