Load a Document from a File(从一个文件中加载文档)

Problem
You have a file on disk that contains HTML(你有一个文件他包含html), that you'd like to load and parse(你想加载并解析它), and then maybe manipulate or extract data from(并操作的它或者获得他的数据).

Solution
Use the static Jsoup.parse(File in, String charsetName, String baseUri) method(使用静态Jsoup.parse(File in, String charsetName, String baseUri)方法):

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Description
The parse(File in, String charsetName, String baseUri) method loads and parses a HTML file(这个arse(File in, String charsetName, String baseUri)方法加载并解析一个HTML文件). If an error occurs whilst loading the file(如果加载的文件有个错误), it will throw an IOException(会抛出IOException错误), which you should handle appropriately(你用合理的方法处理它).

The baseUri parameter is used by the parser to resolve relative URLs in the document before a <base href> element is found(baseUri用参数的解析器能够解析相对url的文档). If that's not a concern for you, you can pass an empty string instead(如果你不需要可以传你个空字符串).

There is a sister method parse(File in, String charsetName) which uses the file's location as the baseUri(它的姐妹方法parse(File in, String charsetName)它使用文件的位置baseUri吗). This is useful if you are working on a filesystem-local site and the relative links it points to are also on the filesystem(这是有用的,如果你工作在文件系统的本地站点和相对链接,它指向也在文件系统。).

猜你喜欢

转载自liuzejian4.iteye.com/blog/1634689