RSS feed replace encoded characters

Swordfish :

I am processing news articles from some RSS feeds and want to display the headlines on my Java based web application.

Some of feeds have encoded characters in the title e.g.

Arsenal's trip to Vitoria a 'more difficult' test than reverse Europea League tie, warns hosts' coach

There may be other encoded characters. Using Java (and without having to define what characters to search/replace) how can I replace all encoded characters so I can display the title correctly on the website. e.g.

Arsenal’s trip to Vitoria a ‘more difficult’ test than reverse Europa League tie, warns hosts’ coach

Benoit :

Apache Commons Lang provides support for this (org.apache.commons:commons-lang3:3.9):

Running:

import org.apache.commons.lang.StringEscapeUtils;

public class Escape {

    public static void main(String[] args) {
        System.out.println(StringEscapeUtils.unescapeXml("Arsenal's trip to Vitoria a 'more difficult' test than reverse Europea League tie, warns hosts' coach"));
    }
}

gives as expected:

Arsenal's trip to Vitoria a 'more difficult' test than reverse Europea League tie, warns hosts' coach

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=358012&siteId=1