Load a Document from a File(从一个文件中加载文档) - 代码天地

Load a Document from a File(从一个文件中加载文档)

企业开发 2018-05-14 13:38:20 阅读次数: 2

Problem
You have a file on disk that contains HTML(你有一个文件他包含html), that you'd like to load and parse（你想加载并解析它）, and then maybe manipulate or extract data from(并操作的它或者获得他的数据).

Solution
Use the static Jsoup.parse(File in, String charsetName, String baseUri) method(使用静态Jsoup.parse(File in, String charsetName, String baseUri)方法):

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Description
The parse(File in, String charsetName, String baseUri) method loads and parses a HTML file(这个arse(File in, String charsetName, String baseUri)方法加载并解析一个HTML文件). If an error occurs whilst loading the file(如果加载的文件有个错误), it will throw an IOException(会抛出IOException错误), which you should handle appropriately(你用合理的方法处理它).

The baseUri parameter is used by the parser to resolve relative URLs in the document before a <base href> element is found(baseUri用参数的解析器能够解析相对url的文档). If that's not a concern for you, you can pass an empty string instead(如果你不需要可以传你个空字符串).

There is a sister method parse(File in, String charsetName) which uses the file's location as the baseUri(它的姐妹方法parse(File in, String charsetName)它使用文件的位置baseUri吗). This is useful if you are working on a filesystem-local site and the relative links it points to are also on the filesystem(这是有用的,如果你工作在文件系统的本地站点和相对链接,它指向也在文件系统。).

猜你喜欢

转载自liuzejian4.iteye.com/blog/1634689

Load a Document from a File(从一个文件中加载文档)

Load a Document from a URL(从一个url中加载文档)

Unexpected exception parsing XML document from file

document load 与document ready的区别

pycharm Cannot load settings from file

locate file in solution explorer from document in window area.

Context Extraction from HTML Document HTML文档的内容提取

$(document).ready()与$(window).load()的区别

$(window).load()和$(document).ready()

Android Studio Cannot Load Settings from file 错误

解决haproxy - unable to load SSL private key from PEM file

Android Studio 提示 Cannot Load Settings from file...... 的错误

Android Studio Cannot load settings from file解决方案

从一个URL加载一个Document

Parse a document from a String(将一个字符串解析成文档对象)

js document.load 和 document.ready 区别

$(document).ready(function(){...})和$(document).load(function(){...})的区别

请指出document load和document ready的区别？

请指出document load和document ready的区别

Heritrix配置成eclipse项目时出现Failed to load properties file from filesystem or from cl

$(document).ready(function(){})与window.load

ducument ready 和document load的区别

conversion failed: could not load input document

document ready与$(window).load()方法区别

windows load a DLL from memory

spark load data from mysql

Could not parse mapping document from file F:\dev\workspacesldgxzxd\framework\webroot\WEB-INF\conf\y

加载springIOC容器报错误：Unexpected exception parsing XML document from class path resource

加载模型时出现 OSError: Unable to load weights from pytorch checkpoint file 报错的解决

Could not parse mapping document from resource

今日推荐

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

“百模大战”必有一战 | 2024中国“百模大战”竞争格局分析

周排行

Family Tree 题解

BZOJ 1093 最大半连通子图 SCC + DP

幂等处理

Spring----学习（2）----XML 配置Bean 自动装配

SQL Server 远程更新目标表数据

HIbernate3.6 环境搭建

特殊符号正则表达式

【Linux】第一章进程的理解

843. n-皇后问题（dfs+输出各种情况）

空间数据库2

每日归档

更多

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(5)