Parse a document from a String(将一个字符串解析成文档对象)

Problem
You have HTML in a Java String(你有一个java字符串格式的html), and you want to parse that HTML to get at its contents(和 你需要解析这个html获得他的内容), or to make sure it's well formed(或者确保他是格式良好的), or to modify it(或者需要修改他). The String may have come from user input(这个字符串可以来自用户输入), a file(一个文件), or from the web(或者一个网站).

Solution(解决方案)
Use the static Jsoup.parse(String html) method(使用静态的Jsoup.parse(String html)), or Jsoup.parse(String html, String baseUri) if the page came from the web(如果字符串来自页面), and you want to get at absolute URLs (和你需要获得里面的绝对路径)(see [working-with-urls]).

String html = "<html><head><title>First parse</title></head>"
  + "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);

Description(描述)
The parse(String html, String baseUri) method parses the input HTML into a new Document()(这个parse(String html,String baseUti) 方法解析输入的html到一个新的Document()对象中). The base URI argument is used to resolve relative URLs into absolute URLs(这个baseURI参数是来解析相对路径到绝对路径的), and should be set to the URL where the document was fetched from(并且应该设置这份文件是取数据取自那里的url). If that's not applicable(如果这是不适用的), or if you know the HTML has a base element(或者你不知道这个html源是那里), you can use the parse(String html) method(你可以使用parse(String html) 方法).

As long as you pass in a non-null string(只要你传入一个非空字符串), you're guaranteed to have a successful(你得保证有一个成功的), sensible parse(明智的解析), with a Document containing (at least) a head and a body element(文档包(至少)含头head和body元素). (BETA: if you do get an exception raised, or a bad parse-tree, please file a bug.)

Once you have a Document(一旦你有了一个文档), you can get get at the data using the appropriate methods in Document and its supers Element and Node(你可以得到所需的数据使用适当的方法在文档及其管理员元素和节点).

猜你喜欢

转载自liuzejian4.iteye.com/blog/1630873
今日推荐