On front-end pages such as html or jsp, be sensitive to where html tags appear
Because once an incomplete html tag appears on the page, your page will crash and you will be at a loss
I once encountered a problem that there was no problem with the system page before, and there was no problem with the data itself
But after importing new data, the html page cannot be displayed normally, and it crashes very much
Then I think that the jsp page has individual modules that extract a piece of text that may contain html tags from the text for display
And the text is randomly intercepted fixed-length characters
Then the question arises, what should I do if there is an incomplete html tag after the interception?
The answer is very simple: your page will be terrible, and you will be helpless at that moment, because it was fine before! O(∩_∩)O haha~
Here are some ways to remove html tags:
The java(jsp) side only extracts Chinese content:
String regex="([\u4e00-\u9fa5]+)"; String aimStr = ""; Matcher matches = Pattern.compile (regex) .matcher (aimStr); if(matcher.find()){ aimStr = matcher.group(0); } System.out.println(aimStr);
The java (jsp) side removes the html tag:
public String removeHtmlTag() { / / Use this method remember to introduce the corresponding class String htmlStr = "";//Text content with html tags String regEx_script = "<script[^>]*?>[\\s\\S]*?<\\/script>"; // 去除script String regEx_style = "<style[^>]*?>[\\s\\S]*?<\\/style>"; // 去除style String regEx_html = "<[^>]+>"; // 去除HTML tag String regEx_space = "\\s+|\t|\r|\n";// 去除other characters Pattern p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE); Matcher m_script = p_script.matcher(htmlStr); htmlStr = m_script.replaceAll(""); Pattern p_style = Pattern .compile(regEx_style, Pattern.CASE_INSENSITIVE); Matcher m_style = p_style.matcher(htmlStr); htmlStr = m_style.replaceAll(""); Pattern p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE); Matches m_html = p_html.matcher (htmlStr); htmlStr = m_html.replaceAll(""); Pattern p_space = Pattern .compile(regEx_space, Pattern.CASE_INSENSITIVE); Matcher m_space = p_space.matcher(htmlStr); htmlStr = m_space.replaceAll(" "); return htmlStr; }
The js side only extracts Chinese content:
aimStr.replace(/[^\u4e00-\u9fa5]/gi,"");
js side to remove html tags:
aimStr.replace(/<[^>]+>/g,"");