J. Snipe :
I'm having a hard time escaping xml to be processed by Java. I'm using JTidy to escape unwanted characters, but struggle to remove "<" and ">" from values such as <tag> capacity < 1000 </tag>
I'm using below code to escape the input
public String CleanXML(String input){
Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-16");
tidy.setOutputEncoding("UTF-16");
tidy.setWraplen(Integer.MAX_VALUE);
tidy.setXmlOut(true);
tidy.setSmartIndent(true);
tidy.setXmlTags(true);
tidy.setMakeClean(true);
tidy.setForceOutput(true);
tidy.setQuiet(true);
tidy.setShowWarnings(false);
StringReader in = new StringReader(input);
StringWriter out = new StringWriter();
tidy.parse(in, out);
return out.toString();
}
Nilanka Manoj :
use following function
private static final Pattern TAG_REGEX = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);
public String CleanXML(String input){
final Matcher matcher = TAG_REGEX.matcher(input);
while (matcher.find()) {
String value = matcher.group(1);
String valueReplace = value.replaceAll("[^a-zA-Z0-9\\s]", "");
input.replace(value,valueReplace);
}
return input;
}
It uses regular expression search to get values between tags then, remove all non alphanumeric characters. Regular expressions and basic idea was gained from Java regex to extract text between tags
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=417025&siteId=1