How to wrap part of text with a <span> or any other HTML tag without new HTML structure being escaped?

aisha :

I am matching a specific string in an element text, and want to wrap the matching text with a span to be able to select it and apply modifications later on, but the html entities are being escaped. Is there a way to wrap the string with html tags with it being escaped ?

I tried using unescapeEntities()​, method but it doesn't work in this case. wrap() didn't work as well. for reference to those methods check https://jsoup.org/apidocs/org/jsoup/parser/Parser.html

Current code :

for (Element div : doc.select("div")) {
    for (String input : listOfStrings) {
        if (div.ownText().contains(input)) {
            div.text(div.ownText().replaceFirst(input, "<span class=\"select-me\">" + input + "</span>"));
        }
    }
}

Desired output

<div>some text <span class="select-me">matched string</span></div>

actual output

<div>some text &lt;span class=&quot;select-me&quot;&gt;matched string&lt;/span&gt;</div>

Pshemo :

Based on your question and comments it looks like you only want to modify direct text-nodes of selected element without modifying text node of potential inner elements of selected text so in case of

<div>a b <span>b c</span></div> 

if we want to modify b we only modify one directly placed in <div> but not one in <span>.

<div>a b <span>b c</span></div> 
       ^       ^----don't modify because it is in <span>, not *directly* in <div>
       |
     modify

Text is not considered as ElementNode like <div> <span> etc, but in DOM it is represented as TextNode so if we have structure like <div> a <span>b</span> c </div> then its DOM representation would be

Element: <div>
├ Text: " a "
├ Element: <span>
│ └ Text: "b"
└ Text: " c "

If we want to wrap portion of some text into <span> (or any other tag) we are effectively splitting singe TextNode

├ Text: "foo bar baz"

into series of:

├ Text: "foo "
├ Element: <span>
│ └ Text: "bar"
└ Text: " baz"

To create solution which uses that idea TextNode API gives us very limited set of tools, but among available methods we can use

  • splitText(index) which modifies original TextNode leaving "left" side of the split in it and returns new TextNode which holds remaining (right) side of the split like if TextNode node1 holds "foo bar" after TextNode node2 = node1.splitText(3); node1 will hold "foo" while node2 will hold " bar" and will be placed as immediate sibling after node1
  • wrap(htmlElement) (inherited from Node superclass) which wraps TextNode in ElementNode representing htmlElement for instance node.wrap("<span class='myClass'>") will result in <span class='myClass>text from node</span>.

With above "tools" we can create method like

static void wrapTextWithElement(TextNode textNode, String strToWrap, String wrapperHTML) {

    while (textNode.text().contains(strToWrap)) {
        // separates part before strToWrap
        // and returns node starting with text we want
        TextNode rightNodeFromSplit = textNode.splitText(textNode.text().indexOf(strToWrap));

        // if there is more text after searched string we need to
        // separate it and handle in next iteration
        if (rightNodeFromSplit.text().length() > strToWrap.length()) {
            textNode = rightNodeFromSplit.splitText(strToWrap.length());
            // after separating remining part rightNodeFromSplit holds
            // only part which we ware looking for so lets wrap it
            rightNodeFromSplit.wrap(wrapperHTML);
        } else { // here we know that node is holding only text to wrap
            rightNodeFromSplit.wrap(wrapperHTML);
            return;// since textNode didn't change but we already handled everything
        }
    }
}

which we can use like:

Document doc = Jsoup.parse("<div>b a b <span>b c</span> d b</div> ");
System.out.println("BEFORE CHANGES:");
System.out.println(doc);

Element id1 = doc.select("div").first();
for (TextNode textNode : id1.textNodes()) {
    wrapTextWithElement(textNode, "b", "<span class='x'>");
}

System.out.println();
System.out.println("AFTER CHANGES");
System.out.println(doc);

Result:

BEFORE CHANGES:
<html>
 <head></head>
 <body>
  <div>
   b a b 
   <span>b c</span> d b
  </div> 
 </body>
</html>

AFTER CHANGES
<html>
 <head></head>
 <body>
  <div>
   <span class="x">b</span> a 
   <span class="x">b</span> 
   <span>b c</span> d 
   <span class="x">b</span>
  </div> 
 </body>
</html>

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=142758&siteId=1