Akhila Rajeev :
I have the following URL :
https://en.wikipedia.org/w/api.php?action=parse§ion=0&prop=text&format=json&page=The%20Matrix
which returns a JSON response with HTML code embedded within a JSON object (See the link).
How do I retrieve details like Actor, Director etc from that HTML part using java?
How do I convert that Html part to JSON using java, if it's possible?
Or is there any way to change the url itself to get the movie data in readable JSON format?
Dhrubajyoti Gogoi :
Here is a solution using jsoup for parsing HTML and jackson for parsing JSON:
public static void main(String[] args) throws IOException {
// Extract JSON string
String body = Jsoup.connect("https://en.wikipedia.org/w/api.php?action=parse§ion=0&prop=text&format=json&page=The%20Matrix")
.ignoreContentType(true).execute().body();
// Extract HTML string from JSON
JsonFactory factory = new JsonFactory();
ObjectMapper mapper = new ObjectMapper(factory);
JsonNode targetNode = mapper.readTree(body).get("parse").get("text").get("*");
// Generic but fragile function to extract specific details
Function<String, String> retrieveDetailsOf = detailsOf ->
Jsoup.parse(targetNode.asText())
.select(".infobox tr th:contains(" + detailsOf + ") ~ td a[title]")
.stream().map(e -> e.attr("title")).collect(Collectors.toList()).toString();
System.out.println(retrieveDetailsOf.apply("Directed by"));
System.out.println(retrieveDetailsOf.apply("Produced by"));
System.out.println(retrieveDetailsOf.apply("Music by"));
System.out.println(retrieveDetailsOf.apply("Starring"));
}
Ouput:
[The Wachowskis]
[Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, Joe Pantoliano]
Dependencies:
implementation("com.fasterxml.jackson.core:jackson-core:2.10.2")
implementation("com.fasterxml.jackson.core:jackson-databind:2.10.2")
Just be mindful of the fact that any changes in the content structure will mostly result in breaks. Rather use official movie detail APIs if available.