Jsoup-abs: href absolute address is empty

Summary

  • When it is worthwhile to get the href in the a tag, it is found that the relative address can be obtained, but the return result is empty using abs: href
  • Looking at the documentation, it is found that the second parameter of Jsoup.parse can be passed into baseURL, we can get it here.

Demo

  • Scala code
val urlList = Jsoup.parse(html,"https://www.jianshu.com/").getElementsByAttributeValue("target","_blank").select(".title")
println(urlList)
println(htmlPage.getBaseURL)
urlList.forEach(
  // Element:
  x => {
    println(x.absUrl("href"))
    println(x.absUrl("abs:href"))
  }
)
  • Scala code
def getContentUrls: List[URL] = {
  val listBuffer = new ListBuffer[URL]
  val urlList = Jsoup.parse(html).select("""a[href~=.*?.html]""")
  urlList.foreach(x => {
    val url =  new URL (x.attr("abs:href"))   // 获取元素A Element 的绝对路径信息
    logger.info("获取URL的绝对路径信息: " + url)
    listBuffer += url
  })
  listBuffer.toList
}

Guess you like

Origin www.cnblogs.com/duchaoqun/p/12755008.html