Processing URLs

problem

Do you have an HTML document that contains a path relative URLs, you need to convert these into URLs relative path absolute path.

method

  1. Make sure that you have specified when parsing the document base URI, and then
  2. Use  abs: property contains prefix to obtain base URIan absolute path. code show as below:
Document doc = Jsoup.connect("http://www.open-open.com").get();

Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://www.open-open.com/"

Explanation

HTML element, URLs are often written in the document path relative position:  <a href="/download">...</a>When you use  Node.attr(String key) the time to obtain a href attribute element method, it returns the HTML source code directly specified predetermined value.

If you need to obtain an absolute path needs to be added before property name  abs: prefix. This will return a URL address that contains the root pathattr("abs:href")

Therefore, when parsing HTML document that defines the base URI is very important.

If you do not want to use the abs: prefix, there is a way to achieve the same function  Node.absUrl(String key).

Guess you like

Origin www.cnblogs.com/deityjian/p/12541635.html