Parse a document from a String(将一个字符串解析成文档对象) - 代码天地

Parse a document from a String(将一个字符串解析成文档对象)

企业开发 2018-05-14 13:58:30 阅读次数: 2

Problem
You have HTML in a Java String(你有一个java字符串格式的html), and you want to parse that HTML to get at its contents(和你需要解析这个html获得他的内容), or to make sure it's well formed(或者确保他是格式良好的), or to modify it(或者需要修改他). The String may have come from user input(这个字符串可以来自用户输入), a file(一个文件), or from the web(或者一个网站).

Solution(解决方案)
Use the static Jsoup.parse(String html) method(使用静态的Jsoup.parse(String html)), or Jsoup.parse(String html, String baseUri) if the page came from the web(如果字符串来自页面), and you want to get at absolute URLs (和你需要获得里面的绝对路径)(see [working-with-urls]).

String html = "<html><head><title>First parse</title></head>"
  + "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);

Description(描述)
The parse(String html, String baseUri) method parses the input HTML into a new Document()(这个parse(String html,String baseUti) 方法解析输入的html到一个新的Document()对象中). The base URI argument is used to resolve relative URLs into absolute URLs(这个baseURI参数是来解析相对路径到绝对路径的), and should be set to the URL where the document was fetched from(并且应该设置这份文件是取数据取自那里的url). If that's not applicable(如果这是不适用的), or if you know the HTML has a base element（或者你不知道这个html源是那里）, you can use the parse(String html) method(你可以使用parse(String html) 方法).

As long as you pass in a non-null string(只要你传入一个非空字符串), you're guaranteed to have a successful(你得保证有一个成功的), sensible parse(明智的解析), with a Document containing (at least) a head and a body element（文档包(至少)含头head和body元素）. (BETA: if you do get an exception raised, or a bad parse-tree, please file a bug.)

Once you have a Document(一旦你有了一个文档), you can get get at the data using the appropriate methods in Document and its supers Element and Node（你可以得到所需的数据使用适当的方法在文档及其管理员元素和节点）.

猜你喜欢

转载自liuzejian4.iteye.com/blog/1630873

Parse a document from a String(将一个字符串解析成文档对象)

Could not parse mapping document from resource

Could not parse mapping document from resource ...

hibernate: Duplicate class/entity; Could not parse mapping document from resource

Could not parse mapping document from resource Event.hbm.xml原因

Could not parse mapping document from file F:\dev\workspacesldgxzxd\framework\webroot\WEB-INF\conf\y

org.hibernate.InvalidMappingException: Could not parse mapping document from无法创建sessionFactory

org.hibernate.InvalidMappingException: Could not parse mapping document from input stream

org.hibernate.InvalidMappingException: Could not parse mapping document from无法创建sessionFactory...

Context Extraction from HTML Document HTML文档的内容提取

错误/异常：org.hibernate.InvalidMappingException: Could not parse mapping document from resource com/shore/model/Husband.hbm.xml 的解决方法

Unable to parse response from server

parse data from Nacos error

Load a Document from a File(从一个文件中加载文档)

Load a Document from a URL(从一个url中加载文档)

输入一个字符串，eg:i am from china，将句子中单词倒置，不改变内部结构

JSON parse error: Cannot deserialize value of type `java.util.Date` from String

JSON parse error: Can not deserialize value of type java.sql.Timestamp from String

成功解决：sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string

解决 “JSON parse error: Cannot deserialize value of type java.util.Date from String“ 错误的方法

JSON parse error: Cannot deserialize value of type `java.time.LocalDateTime` from String

from urllib import parse模块的使用

Could not read document: Cannot deserialize value of type `java.util.Date` from String

document对象

document 对象

[Paper] From Word Embeddings To Document Distances

文献阅读 - From Word Embeddings To Document Distances

Unexpected exception parsing XML document from file

错误：JSON parse error: Can not deserialize value of type java.util.Date from String "2019-11-09": 及解决

解决日期转换异常 JSON parse error: Cannot deserialize value of type `java.util.Date` from String总结

今日推荐

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

周排行

计算机组成与设计（七）—— 除法器

Integer Approximation(分治+枚举)

大话数据库索引

windows10系统JDK的配置及下载地址

mysql实现秒值转换中原六仔平台搭建

Codeforces Round #556 (Div. 1)

百练1064 网线主管

Codeforces 995F Cowmpany Cowmpensation

子集生成之增量构造法，位向量法，二进制法

ERROR: cmd.exe failed with args /c "/APK\gradle\rungradle.bat...

每日归档

更多

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)