I am matching a specific string in an element text, and want to wrap the matching text with a span to be able to select it and apply modifications later on, but the html entities are being escaped. Is there a way to wrap the string with html tags with it being escaped ?
I tried using unescapeEntities()
, method but it doesn't work in this case. wrap()
didn't work as well. for reference to those methods check https://jsoup.org/apidocs/org/jsoup/parser/Parser.html
Current code :
for (Element div : doc.select("div")) {
for (String input : listOfStrings) {
if (div.ownText().contains(input)) {
div.text(div.ownText().replaceFirst(input, "<span class=\"select-me\">" + input + "</span>"));
}
}
}
Desired output
<div>some text <span class="select-me">matched string</span></div>
actual output
<div>some text <span class="select-me">matched string</span></div>
Based on your question and comments it looks like you only want to modify direct text-nodes of selected element without modifying text node of potential inner elements of selected text so in case of
<div>a b <span>b c</span></div>
if we want to modify b
we only modify one directly placed in <div>
but not one in <span>
.
<div>a b <span>b c</span></div>
^ ^----don't modify because it is in <span>, not *directly* in <div>
|
modify
Text is not considered as ElementNode
like <div>
<span>
etc, but in DOM it is represented as TextNode
so if we have structure like <div> a <span>b</span> c </div>
then its DOM representation would be
Element: <div>
├ Text: " a "
├ Element: <span>
│ └ Text: "b"
└ Text: " c "
If we want to wrap portion of some text into <span>
(or any other tag) we are effectively splitting singe TextNode
├ Text: "foo bar baz"
into series of:
├ Text: "foo "
├ Element: <span>
│ └ Text: "bar"
└ Text: " baz"
To create solution which uses that idea TextNode API gives us very limited set of tools, but among available methods we can use
splitText(index)
which modifies original TextNode leaving "left" side of the split in it and returns new TextNode which holds remaining (right) side of the split like ifTextNode node1
holds"foo bar"
afterTextNode node2 = node1.splitText(3);
node1
will hold"foo"
whilenode2
will hold" bar"
and will be placed as immediate sibling afternode1
wrap(htmlElement)
(inherited fromNode
superclass) which wraps TextNode in ElementNode representinghtmlElement
for instancenode.wrap("<span class='myClass'>")
will result in<span class='myClass>text from node</span>
.
With above "tools" we can create method like
static void wrapTextWithElement(TextNode textNode, String strToWrap, String wrapperHTML) {
while (textNode.text().contains(strToWrap)) {
// separates part before strToWrap
// and returns node starting with text we want
TextNode rightNodeFromSplit = textNode.splitText(textNode.text().indexOf(strToWrap));
// if there is more text after searched string we need to
// separate it and handle in next iteration
if (rightNodeFromSplit.text().length() > strToWrap.length()) {
textNode = rightNodeFromSplit.splitText(strToWrap.length());
// after separating remining part rightNodeFromSplit holds
// only part which we ware looking for so lets wrap it
rightNodeFromSplit.wrap(wrapperHTML);
} else { // here we know that node is holding only text to wrap
rightNodeFromSplit.wrap(wrapperHTML);
return;// since textNode didn't change but we already handled everything
}
}
}
which we can use like:
Document doc = Jsoup.parse("<div>b a b <span>b c</span> d b</div> ");
System.out.println("BEFORE CHANGES:");
System.out.println(doc);
Element id1 = doc.select("div").first();
for (TextNode textNode : id1.textNodes()) {
wrapTextWithElement(textNode, "b", "<span class='x'>");
}
System.out.println();
System.out.println("AFTER CHANGES");
System.out.println(doc);
Result:
BEFORE CHANGES:
<html>
<head></head>
<body>
<div>
b a b
<span>b c</span> d b
</div>
</body>
</html>
AFTER CHANGES
<html>
<head></head>
<body>
<div>
<span class="x">b</span> a
<span class="x">b</span>
<span>b c</span> d
<span class="x">b</span>
</div>
</body>
</html>