Regular expression matching punctuation

Original link: https: //blog.csdn.net/q77533005/article/details/83642725

Excerpt:
STR = str.replaceAll ( "[\ pP ''" "]", "");

Unicode character encoding is not just for a simple definition of a code, but also it was classified.

\ PP lowercase p in which the property is meant to indicate Unicode properties for the prefix Unicode regular expressions.

Uppercase P represents one of the seven character attributes Unicode character set: punctuation characters.

The other six are

L: letters;
M: marker symbol (generally not alone);
the Z: the separator (such as spaces, line breaks, etc.);
S: symbol (such as mathematical symbols, currency, etc.);
N: number (such as Arabic numerals, Roman digital, etc.);
C: other characters

This property is above seven, seven attributes as well as several sub-attributes, for further subdivided.

Java regular expression data for Unicode Unicode is provided by the organization.

Unicode standard regular expression (can be found in all sub-properties)
http://www.unicode.org/reports/tr18/

The definition of the Unicode character attributes, can have a look at what a character attribute.
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

This text document is a character line, first column is Unicode encoding, the second column is the name of the character, and the third column is the Unicode properties,
as well as some other character information.

发布了75 篇原创文章 · 获赞 7 · 访问量 1万+

Guess you like

Origin blog.csdn.net/zhengdong12345/article/details/100777961
Recommended