深入学习java源码之toChars()与isUpperCase()
Character类包装一个对象中的基本类型char的值。 类型为Character的对象包含一个单一字段,其类型为char 。
此外,该类还提供了几种方法来确定字符的类别(小写字母,数字等),并将字符从大写转换为小写,反之亦然。
字符信息基于Unicode标准版本6.2.0。
的方法和类的数据Character通过在UnicodeData文件的是由Unicode Consortium维护的Unicode字符数据库的一部分的信息来定义。 该文件为每个定义的Unicode代码点或字符范围指定各种属性,包括名称和常规类别。
char数据类型(因此Character对象封装的值)基于原始Unicode规范,其将字符定义为固定宽度的16位实体。 Unicode标准已经被更改为允许其表示需要超过16位的字符。 法定代码点的范围现在是U + 0000到U + 10FFFF,称为Unicode标量值 。 (请参阅Unicode标准中U + n符号的 definition。 )
The set of characters from U+0000 to U+FFFF有时被称为基本多语言平面(BMP) 。 Characters其代码点大于U + FFFF称为增补字符秒。 Java平台在char阵列和String和StringBuffer类中使用UTF-16表示。 在此表示中,补充字符表示为一对char值,第一个来自高替代范围(\ uD800- \ uDBFF),第二个来自低代理范围(\ uDC00- \ uDFFF)。
因此, char值代表基本多语言平面(BMP)代码点,包括代码代码点或UTF-16编码的代码单位。 int值代表所有Unicode代码点,包括补充代码点。 下(至少显著)的21个比特int用于表示Unicode代码点和上部(最显著)11位必须为零。 除非另有说明,关于补充字符和代数char值的行为如下:
仅接受char值的方法不能支持补充字符。 他们将char范围中的char值视为未定义的字符。 例如, Character.isLetter('\uD840')返回false ,即使这个特定值如果后面跟着一个字符串中的任何低代理值都会表示一个字母。
接受int值的方法支持所有Unicode字符,包括补充字符。 例如, Character.isLetter(0x2F81A)返回true ,因为代码点值表示一个字母(一个CJK表意文字)。
在Java SE API文档中, Unicode代码点用于U + 0000和U + 10FFFF之间的字符值, Unicode代码单位用作UTF-16编码的16位char值。 有关Unicode术语的更多信息
方法
Modifier and Type | Method and Description |
---|---|
static int |
charCount(int codePoint) 确定代表指定字符(Unicode代码点)所需的 |
char |
charValue() 返回此 |
static int |
codePointAt(char[] a, int index) 返回 |
static int |
codePointAt(char[] a, int index, int limit) 返回 |
static int |
codePointAt(CharSequence seq, int index) 返回 |
static int |
codePointBefore(char[] a, int index) 返回 |
static int |
codePointBefore(char[] a, int index, int start) 返回 |
static int |
codePointBefore(CharSequence seq, int index) 返回的给定索引前面的代码点 |
static int |
codePointCount(char[] a, int offset, int count) 返回 |
static int |
codePointCount(CharSequence seq, int beginIndex, int endIndex) 返回指定字符序列的文本范围内的Unicode代码点数。 |
static int |
compare(char x, char y) 数值比较两个 |
int |
compareTo(Character anotherCharacter) 数字比较两个 |
static int |
digit(char ch, int radix) 返回指定基数中字符 |
static int |
digit(int codePoint, int radix) 返回指定基数中指定字符(Unicode代码点)的数值。 |
boolean |
equals(Object obj) 将此对象与指定对象进行比较。 |
static char |
forDigit(int digit, int radix) 确定指定基数中特定数字的字符表示。 |
static byte |
getDirectionality(char ch) 返回给定字符的Unicode方向属性。 |
static byte |
getDirectionality(int codePoint) 返回给定字符的Unicode方向性属性(Unicode代码点)。 |
static String |
getName(int codePoint) 返回指定字符的Unicode名称 |
static int |
getNumericValue(char ch) 返回指定的Unicode字符代表的 |
static int |
getNumericValue(int codePoint) 返回 |
static int |
getType(char ch) 返回一个值,表示一个字符的一般类别。 |
static int |
getType(int codePoint) 返回一个值,表示一个字符的一般类别。 |
int |
hashCode() 返回这个 |
static int |
hashCode(char value) 返回一个 |
static char |
highSurrogate(int codePoint) 返回主导替代(一个 high surrogate code unit所述的) surrogate pair表示在UTF-16编码指定的补充的字符(Unicode代码点)。 |
static boolean |
isAlphabetic(int codePoint) 确定指定的字符(Unicode代码点)是否是字母表。 |
static boolean |
isBmpCodePoint(int codePoint) 确定指定的字符(Unicode代码点)是否在 Basic Multilingual Plane (BMP)中 。 |
static boolean |
isDefined(char ch) 确定字符是否以Unicode定义。 |
static boolean |
isDefined(int codePoint) 确定Unicode中是否定义了一个字符(Unicode代码点)。 |
static boolean |
isDigit(char ch) 确定指定的字符是否是数字。 |
static boolean |
isDigit(int codePoint) 确定指定的字符(Unicode代码点)是否为数字。 |
static boolean |
isHighSurrogate(char ch) 确定给定的 |
static boolean |
isIdentifierIgnorable(char ch) 确定指定的字符是否应被视为Java标识符或Unicode标识符中的可忽略字符。 |
static boolean |
isIdentifierIgnorable(int codePoint) 确定指定字符(Unicode代码点)是否应被视为Java标识符或Unicode标识符中的可忽略字符。 |
static boolean |
isIdeographic(int codePoint) 确定指定字符(Unicode代码点)是否是Unicode标准定义的CJKV(中文,日文,韩文和越南文)表意文字。 |
static boolean |
isISOControl(char ch) 确定指定的字符是否是ISO控制字符。 |
static boolean |
isISOControl(int codePoint) 确定引用的字符(Unicode代码点)是否是ISO控制字符。 |
static boolean |
isJavaIdentifierPart(char ch) 确定指定的字符是否可以是Java标识符的一部分,而不是第一个字符。 |
static boolean |
isJavaIdentifierPart(int codePoint) 确定字符(Unicode代码点)可能是Java标识符的一部分,而不是第一个字符。 |
static boolean |
isJavaIdentifierStart(char ch) 确定指定字符是否允许作为Java标识符中的第一个字符。 |
static boolean |
isJavaIdentifierStart(int codePoint) 确定字符(Unicode代码点)是否允许作为Java标识符中的第一个字符。 |
static boolean |
isJavaLetter(char ch) 已弃用 替换为isJavaIdentifierStart(char)。 |
static boolean |
isJavaLetterOrDigit(char ch) 已弃用 由isJavaIdentifierPart(char)替代。 |
static boolean |
isLetter(char ch) 确定指定的字符是否是一个字母。 |
static boolean |
isLetter(int codePoint) 确定指定的字符(Unicode代码点)是否是一个字母。 |
static boolean |
isLetterOrDigit(char ch) 确定指定的字符是字母还是数字。 |
static boolean |
isLetterOrDigit(int codePoint) 确定指定的字符(Unicode代码点)是字母还是数字。 |
static boolean |
isLowerCase(char ch) 确定指定的字符是否是小写字符。 |
static boolean |
isLowerCase(int codePoint) 确定指定的字符(Unicode代码点)是否是小写字符。 |
static boolean |
isLowSurrogate(char ch) 确定给定的 |
static boolean |
isMirrored(char ch) 根据Unicode规范确定字符是否镜像。 |
static boolean |
isMirrored(int codePoint) 确定是否根据Unicode规范镜像指定的字符(Unicode代码点)。 |
static boolean |
isSpace(char ch) 已弃用 替换为isWhitespace(char)。 |
static boolean |
isSpaceChar(char ch) 确定指定的字符是否是Unicode空格字符。 |
static boolean |
isSpaceChar(int codePoint) 确定指定字符(Unicode代码点)是否为Unicode空格字符。 |
static boolean |
isSupplementaryCodePoint(int codePoint) 确定指定字符(Unicode代码点)是否在 supplementary character范围内。 |
static boolean |
isSurrogate(char ch) 确定给定的 |
static boolean |
isSurrogatePair(char high, char low) 确定指定的一对 |
static boolean |
isTitleCase(char ch) 确定指定的字符是否是一个titlecase字符。 |
static boolean |
isTitleCase(int codePoint) 确定指定的字符(Unicode代码点)是否是一个titlecase字符。 |
static boolean |
isUnicodeIdentifierPart(char ch) 确定指定的字符是否可以是Unicode标识符的一部分,而不是第一个字符。 |
static boolean |
isUnicodeIdentifierPart(int codePoint) 确定指定的字符(Unicode代码点)是否可能是Unicode标识符的一部分,而不是第一个字符。 |
static boolean |
isUnicodeIdentifierStart(char ch) 确定指定字符是否允许为Unicode标识符中的第一个字符。 |
static boolean |
isUnicodeIdentifierStart(int codePoint) 确定Unicode标识符中的第一个字符是否允许指定的字符(Unicode代码点)。 |
static boolean |
isUpperCase(char ch) 确定指定的字符是否为大写字符。 |
static boolean |
isUpperCase(int codePoint) 确定指定的字符(Unicode代码点)是否为大写字符。 |
static boolean |
isValidCodePoint(int codePoint) 确定指定的代码点是否有效 Unicode code point value 。 |
static boolean |
isWhitespace(char ch) 根据Java确定指定的字符是否为空格。 |
static boolean |
isWhitespace(int codePoint) 根据Java确定指定字符(Unicode代码点)是否为空格。 |
static char |
lowSurrogate(int codePoint) 返回尾随替代(一个 low surrogate code unit所述的) surrogate pair表示在UTF-16编码指定的补充的字符(Unicode代码点)。 |
static int |
offsetByCodePoints(char[] a, int start, int count, int index, int codePointOffset) 返回给定的 |
static int |
offsetByCodePoints(CharSequence seq, int index, int codePointOffset) 返回给定的char序列中与 |
static char |
reverseBytes(char ch) 返回通过反转指定的 char值中的字节顺序获得的值。 |
static char[] |
toChars(int codePoint) 将指定的字符(Unicode代码点)转换为存储在 |
static int |
toChars(int codePoint, char[] dst, int dstIndex) 将指定的字符(Unicode代码点)转换为其UTF-16表示形式。 |
static int |
toCodePoint(char high, char low) 将指定的代理对转换为其补充代码点值。 |
static char |
toLowerCase(char ch) 使用UnicodeData文件中的大小写映射信息将字符参数转换为小写。 |
static int |
toLowerCase(int codePoint) 使用UnicodeData文件中的大小写映射信息将字符(Unicode代码点)参数转换为小写。 |
String |
toString() 返回 |
static String |
toString(char c) 返回一个 |
static char |
toTitleCase(char ch) 使用UnicodeData文件中的案例映射信息将字符参数转换为titlecase。 |
static int |
toTitleCase(int codePoint) 使用UnicodeData文件中的案例映射信息将字符(Unicode代码点)参数转换为titlecase。 |
static char |
toUpperCase(char ch) 使用UnicodeData文件中的案例映射信息将字符参数转换为大写。 |
static int |
toUpperCase(int codePoint) 使用UnicodeData文件中的案例映射信息将字符(Unicode代码点)参数转换为大写。 |
static Character |
valueOf(char c) 返回一个 表示指定的 char值的 Character实例。 |
java源码
import java.util.Arrays;
import java.util.Map;
import java.util.HashMap;
import java.util.Locale;
public final
class Character implements java.io.Serializable, Comparable<Character> {
public static final int MIN_RADIX = 2;
public static final int MAX_RADIX = 36;
public static final char MIN_VALUE = '\u0000';
public static final char MAX_VALUE = '\uFFFF';
@SuppressWarnings("unchecked")
public static final Class<Character> TYPE = (Class<Character>) Class.getPrimitiveClass("char");
public static final byte UNASSIGNED = 0;
public static final byte UPPERCASE_LETTER = 1;
static final int ERROR = 0xFFFFFFFF;
public static final byte DIRECTIONALITY_UNDEFINED = -1;
public static final char MIN_HIGH_SURROGATE = '\uD800';
public static final char MAX_HIGH_SURROGATE = '\uDBFF';
public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;
private final char value;
private static final long serialVersionUID = 3786198910865385080L;
public Character(char value) {
this.value = value;
}
private static class CharacterCache {
private CharacterCache(){}
static final Character cache[] = new Character[127 + 1];
static {
for (int i = 0; i < cache.length; i++)
cache[i] = new Character((char)i);
}
}
public static Character valueOf(char c) {
if (c <= 127) { // must cache
return CharacterCache.cache[(int)c];
}
return new Character(c);
}
public char charValue() {
return value;
}
public int hashCode() {
return Character.hashCode(value);
}
public static int hashCode(char value) {
return (int)value;
}
public boolean equals(Object obj) {
if (obj instanceof Character) {
return value == ((Character)obj).charValue();
}
return false;
}
public String toString() {
char buf[] = {value};
return String.valueOf(buf);
}
public static String toString(char c) {
return String.valueOf(c);
}
public static boolean isValidCodePoint(int codePoint) {
// Optimized form of:
// codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT
int plane = codePoint >>> 16;
return plane < ((MAX_CODE_POINT + 1) >>> 16);
}
public static boolean isBmpCodePoint(int codePoint) {
return codePoint >>> 16 == 0;
// Optimized form of:
// codePoint >= MIN_VALUE && codePoint <= MAX_VALUE
// We consistently use logical shift (>>>) to facilitate
// additional runtime optimizations.
}
public static boolean isSupplementaryCodePoint(int codePoint) {
return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT
&& codePoint < MAX_CODE_POINT + 1;
}
public static boolean isHighSurrogate(char ch) {
// Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
}
public static boolean isLowSurrogate(char ch) {
return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);
}
public static boolean isSurrogate(char ch) {
return ch >= MIN_SURROGATE && ch < (MAX_SURROGATE + 1);
}
public static boolean isSurrogatePair(char high, char low) {
return isHighSurrogate(high) && isLowSurrogate(low);
}
public static int charCount(int codePoint) {
return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1;
}
public static int toCodePoint(char high, char low) {
// Optimized form of:
// return ((high - MIN_HIGH_SURROGATE) << 10)
// + (low - MIN_LOW_SURROGATE)
// + MIN_SUPPLEMENTARY_CODE_POINT;
return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT
- (MIN_HIGH_SURROGATE << 10)
- MIN_LOW_SURROGATE);
}
public static int codePointAt(CharSequence seq, int index) {
char c1 = seq.charAt(index);
if (isHighSurrogate(c1) && ++index < seq.length()) {
char c2 = seq.charAt(index);
if (isLowSurrogate(c2)) {
return toCodePoint(c1, c2);
}
}
return c1;
}
public static int codePointAt(char[] a, int index) {
return codePointAtImpl(a, index, a.length);
}
public static int codePointAt(char[] a, int index, int limit) {
if (index >= limit || limit < 0 || limit > a.length) {
throw new IndexOutOfBoundsException();
}
return codePointAtImpl(a, index, limit);
}
// throws ArrayIndexOutOfBoundsException if index out of bounds
static int codePointAtImpl(char[] a, int index, int limit) {
char c1 = a[index];
if (isHighSurrogate(c1) && ++index < limit) {
char c2 = a[index];
if (isLowSurrogate(c2)) {
return toCodePoint(c1, c2);
}
}
return c1;
}
public static int codePointBefore(CharSequence seq, int index) {
char c2 = seq.charAt(--index);
if (isLowSurrogate(c2) && index > 0) {
char c1 = seq.charAt(--index);
if (isHighSurrogate(c1)) {
return toCodePoint(c1, c2);
}
}
return c2;
}
public static int codePointBefore(char[] a, int index) {
return codePointBeforeImpl(a, index, 0);
}
public static int codePointBefore(char[] a, int index, int start) {
if (index <= start || start < 0 || start >= a.length) {
throw new IndexOutOfBoundsException();
}
return codePointBeforeImpl(a, index, start);
}
// throws ArrayIndexOutOfBoundsException if index-1 out of bounds
static int codePointBeforeImpl(char[] a, int index, int start) {
char c2 = a[--index];
if (isLowSurrogate(c2) && index > start) {
char c1 = a[--index];
if (isHighSurrogate(c1)) {
return toCodePoint(c1, c2);
}
}
return c2;
}
public static char highSurrogate(int codePoint) {
return (char) ((codePoint >>> 10)
+ (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));
}
public static char lowSurrogate(int codePoint) {
return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);
}
public static int toChars(int codePoint, char[] dst, int dstIndex) {
if (isBmpCodePoint(codePoint)) {
dst[dstIndex] = (char) codePoint;
return 1;
} else if (isValidCodePoint(codePoint)) {
toSurrogates(codePoint, dst, dstIndex);
return 2;
} else {
throw new IllegalArgumentException();
}
}
public static char[] toChars(int codePoint) {
if (isBmpCodePoint(codePoint)) {
return new char[] { (char) codePoint };
} else if (isValidCodePoint(codePoint)) {
char[] result = new char[2];
toSurrogates(codePoint, result, 0);
return result;
} else {
throw new IllegalArgumentException();
}
}
static void toSurrogates(int codePoint, char[] dst, int index) {
// We write elements "backwards" to guarantee all-or-nothing
dst[index+1] = lowSurrogate(codePoint);
dst[index] = highSurrogate(codePoint);
}
public static int codePointCount(CharSequence seq, int beginIndex, int endIndex) {
int length = seq.length();
if (beginIndex < 0 || endIndex > length || beginIndex > endIndex) {
throw new IndexOutOfBoundsException();
}
int n = endIndex - beginIndex;
for (int i = beginIndex; i < endIndex; ) {
if (isHighSurrogate(seq.charAt(i++)) && i < endIndex &&
isLowSurrogate(seq.charAt(i))) {
n--;
i++;
}
}
return n;
}
public static int codePointCount(char[] a, int offset, int count) {
if (count > a.length - offset || offset < 0 || count < 0) {
throw new IndexOutOfBoundsException();
}
return codePointCountImpl(a, offset, count);
}
static int codePointCountImpl(char[] a, int offset, int count) {
int endIndex = offset + count;
int n = count;
for (int i = offset; i < endIndex; ) {
if (isHighSurrogate(a[i++]) && i < endIndex &&
isLowSurrogate(a[i])) {
n--;
i++;
}
}
return n;
}
public static int offsetByCodePoints(CharSequence seq, int index,
int codePointOffset) {
int length = seq.length();
if (index < 0 || index > length) {
throw new IndexOutOfBoundsException();
}
int x = index;
if (codePointOffset >= 0) {
int i;
for (i = 0; x < length && i < codePointOffset; i++) {
if (isHighSurrogate(seq.charAt(x++)) && x < length &&
isLowSurrogate(seq.charAt(x))) {
x++;
}
}
if (i < codePointOffset) {
throw new IndexOutOfBoundsException();
}
} else {
int i;
for (i = codePointOffset; x > 0 && i < 0; i++) {
if (isLowSurrogate(seq.charAt(--x)) && x > 0 &&
isHighSurrogate(seq.charAt(x-1))) {
x--;
}
}
if (i < 0) {
throw new IndexOutOfBoundsException();
}
}
return x;
}
public static int offsetByCodePoints(char[] a, int start, int count,
int index, int codePointOffset) {
if (count > a.length-start || start < 0 || count < 0
|| index < start || index > start+count) {
throw new IndexOutOfBoundsException();
}
return offsetByCodePointsImpl(a, start, count, index, codePointOffset);
}
static int offsetByCodePointsImpl(char[]a, int start, int count,
int index, int codePointOffset) {
int x = index;
if (codePointOffset >= 0) {
int limit = start + count;
int i;
for (i = 0; x < limit && i < codePointOffset; i++) {
if (isHighSurrogate(a[x++]) && x < limit &&
isLowSurrogate(a[x])) {
x++;
}
}
if (i < codePointOffset) {
throw new IndexOutOfBoundsException();
}
} else {
int i;
for (i = codePointOffset; x > start && i < 0; i++) {
if (isLowSurrogate(a[--x]) && x > start &&
isHighSurrogate(a[x-1])) {
x--;
}
}
if (i < 0) {
throw new IndexOutOfBoundsException();
}
}
return x;
}
public static boolean isLowerCase(char ch) {
return isLowerCase((int)ch);
}
public static boolean isLowerCase(int codePoint) {
return getType(codePoint) == Character.LOWERCASE_LETTER ||
CharacterData.of(codePoint).isOtherLowercase(codePoint);
}
public static boolean isUpperCase(char ch) {
return isUpperCase((int)ch);
}
public static boolean isUpperCase(int codePoint) {
return getType(codePoint) == Character.UPPERCASE_LETTER ||
CharacterData.of(codePoint).isOtherUppercase(codePoint);
}
public static boolean isTitleCase(char ch) {
return isTitleCase((int)ch);
}
public static boolean isTitleCase(int codePoint) {
return getType(codePoint) == Character.TITLECASE_LETTER;
}
public static boolean isDigit(char ch) {
return isDigit((int)ch);
}
public static boolean isDigit(int codePoint) {
return getType(codePoint) == Character.DECIMAL_DIGIT_NUMBER;
}
public static boolean isDefined(char ch) {
return isDefined((int)ch);
}
public static boolean isDefined(int codePoint) {
return getType(codePoint) != Character.UNASSIGNED;
}
public static boolean isLetter(char ch) {
return isLetter((int)ch);
}
public static boolean isLetter(int codePoint) {
return ((((1 << Character.UPPERCASE_LETTER) |
(1 << Character.LOWERCASE_LETTER) |
(1 << Character.TITLECASE_LETTER) |
(1 << Character.MODIFIER_LETTER) |
(1 << Character.OTHER_LETTER)) >> getType(codePoint)) & 1)
!= 0;
}
public static boolean isLetterOrDigit(char ch) {
return isLetterOrDigit((int)ch);
}
public static boolean isLetterOrDigit(int codePoint) {
return ((((1 << Character.UPPERCASE_LETTER) |
(1 << Character.LOWERCASE_LETTER) |
(1 << Character.TITLECASE_LETTER) |
(1 << Character.MODIFIER_LETTER) |
(1 << Character.OTHER_LETTER) |
(1 << Character.DECIMAL_DIGIT_NUMBER)) >> getType(codePoint)) & 1)
!= 0;
}
@Deprecated
public static boolean isJavaLetter(char ch) {
return isJavaIdentifierStart(ch);
}
@Deprecated
public static boolean isJavaLetterOrDigit(char ch) {
return isJavaIdentifierPart(ch);
}
public static boolean isAlphabetic(int codePoint) {
return (((((1 << Character.UPPERCASE_LETTER) |
(1 << Character.LOWERCASE_LETTER) |
(1 << Character.TITLECASE_LETTER) |
(1 << Character.MODIFIER_LETTER) |
(1 << Character.OTHER_LETTER) |
(1 << Character.LETTER_NUMBER)) >> getType(codePoint)) & 1) != 0) ||
CharacterData.of(codePoint).isOtherAlphabetic(codePoint);
}
public static boolean isIdeographic(int codePoint) {
return CharacterData.of(codePoint).isIdeographic(codePoint);
}
public static boolean isJavaIdentifierStart(char ch) {
return isJavaIdentifierStart((int)ch);
}
public static boolean isJavaIdentifierStart(int codePoint) {
return CharacterData.of(codePoint).isJavaIdentifierStart(codePoint);
}
}
abstract class CharacterData {
abstract int getProperties(int ch);
abstract int getType(int ch);
abstract boolean isWhitespace(int ch);
abstract boolean isMirrored(int ch);
abstract boolean isJavaIdentifierStart(int ch);
abstract boolean isJavaIdentifierPart(int ch);
abstract boolean isUnicodeIdentifierStart(int ch);
abstract boolean isUnicodeIdentifierPart(int ch);
abstract boolean isIdentifierIgnorable(int ch);
abstract int toLowerCase(int ch);
abstract int toUpperCase(int ch);
abstract int toTitleCase(int ch);
abstract int digit(int ch, int radix);
abstract int getNumericValue(int ch);
abstract byte getDirectionality(int ch);
//need to implement for JSR204
int toUpperCaseEx(int ch) {
return toUpperCase(ch);
}
char[] toUpperCaseCharArray(int ch) {
return null;
}
boolean isOtherLowercase(int ch) {
return false;
}
boolean isOtherUppercase(int ch) {
return false;
}
boolean isOtherAlphabetic(int ch) {
return false;
}
boolean isIdeographic(int ch) {
return false;
}
// Character <= 0xff (basic latin) is handled by internal fast-path
// to avoid initializing large tables.
// Note: performance of this "fast-path" code may be sub-optimal
// in negative cases for some accessors due to complicated ranges.
// Should revisit after optimization of table initialization.
static final CharacterData of(int ch) {
if (ch >>> 8 == 0) { // fast-path
return CharacterDataLatin1.instance;
} else {
switch(ch >>> 16) { //plane 00-16
case(0):
return CharacterData00.instance;
case(1):
return CharacterData01.instance;
case(2):
return CharacterData02.instance;
case(14):
return CharacterData0E.instance;
case(15): // Private Use
case(16): // Private Use
return CharacterDataPrivateUse.instance;
default:
return CharacterDataUndefined.instance;
}
}
}
}
class CharacterDataLatin1 extends CharacterData {
int getProperties(int ch) {
char offset = (char)ch;
int props = A[offset];
return props;
}
int getPropertiesEx(int ch) {
char offset = (char)ch;
int props = B[offset];
return props;
}
boolean isOtherLowercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0001) != 0;
}
boolean isOtherUppercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0002) != 0;
}
boolean isOtherAlphabetic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0004) != 0;
}
boolean isIdeographic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0010) != 0;
}
int getType(int ch) {
int props = getProperties(ch);
return (props & 0x1F);
}
boolean isJavaIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) >= 0x00005000);
}
boolean isJavaIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00003000) != 0);
}
boolean isUnicodeIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00007000);
}
boolean isUnicodeIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00001000) != 0);
}
boolean isIdentifierIgnorable(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00001000);
}
int toLowerCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if (((val & 0x00020000) != 0) &&
((val & 0x07FC0000) != 0x07FC0000)) {
int offset = val << 5 >> (5+18);
mapChar = ch + offset;
}
return mapChar;
}
int toUpperCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) != 0x07FC0000) {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
} else if (ch == 0x00B5) {
mapChar = 0x039C;
}
}
return mapChar;
}
int toTitleCase(int ch) {
return toUpperCase(ch);
}
int digit(int ch, int radix) {
int value = -1;
if (radix >= Character.MIN_RADIX && radix <= Character.MAX_RADIX) {
int val = getProperties(ch);
int kind = val & 0x1F;
if (kind == Character.DECIMAL_DIGIT_NUMBER) {
value = ch + ((val & 0x3E0) >> 5) & 0x1F;
}
else if ((val & 0xC00) == 0x00000C00) {
// Java supradecimal digit
value = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
}
}
return (value < radix) ? value : -1;
}
int getNumericValue(int ch) {
int val = getProperties(ch);
int retval = -1;
switch (val & 0xC00) {
default: // cannot occur
case (0x00000000): // not numeric
retval = -1;
break;
case (0x00000400): // simple numeric
retval = ch + ((val & 0x3E0) >> 5) & 0x1F;
break;
case (0x00000800) : // "strange" numeric
retval = -2;
break;
case (0x00000C00): // Java supradecimal
retval = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
break;
}
return retval;
}
boolean isWhitespace(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00004000);
}
byte getDirectionality(int ch) {
int val = getProperties(ch);
byte directionality = (byte)((val & 0x78000000) >> 27);
if (directionality == 0xF ) {
directionality = -1;
}
return directionality;
}
boolean isMirrored(int ch) {
int props = getProperties(ch);
return ((props & 0x80000000) != 0);
}
int toUpperCaseEx(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) != 0x07FC0000) {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
}
else {
switch(ch) {
// map overflow characters
case 0x00B5 : mapChar = 0x039C; break;
default : mapChar = Character.ERROR; break;
}
}
}
return mapChar;
}
static char[] sharpsMap = new char[] {'S', 'S'};
char[] toUpperCaseCharArray(int ch) {
char[] upperMap = {(char)ch};
if (ch == 0x00DF) {
upperMap = sharpsMap;
}
return upperMap;
}
static final CharacterDataLatin1 instance = new CharacterDataLatin1();
private CharacterDataLatin1() {};
static final int A[] = new int[256];
static final String A_DATA =
"\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800"+
"\u100F\u4800\u100F\u4800\u100F\u5800\u400F\u5000\u400F\u5800\u400F\u6000\u400F"+
"\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002\u6800"+
"\031\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002"+
"\u061D\u7002";
// The B table has 256 entries for a total of 512 bytes.
static final char B[] = (
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+
"\000\000\000\000\000\000\000\000\000").toCharArray();
// In all, the character property tables require 1024 bytes.
static {
{ // THIS CODE WAS AUTOMATICALLY CREATED BY GenerateCharacter:
char[] data = A_DATA.toCharArray();
assert (data.length == (256 * 2));
int i = 0, j = 0;
while (i < (256 * 2)) {
int entry = data[i++] << 16;
A[j++] = entry | data[i++];
}
}
}
}
package java.lang;
/**
* The CharacterData00 class encapsulates the large tables once found in
* java.lang.Character
*/
class CharacterData00 extends CharacterData {
int getProperties(int ch) {
char offset = (char)ch;
int props = A[Y[X[offset>>5]|((offset>>1)&0xF)]|(offset&0x1)];
return props;
}
int getPropertiesEx(int ch) {
char offset = (char)ch;
int props = B[Y[X[offset>>5]|((offset>>1)&0xF)]|(offset&0x1)];
return props;
}
int getType(int ch) {
int props = getProperties(ch);
return (props & 0x1F);
}
boolean isOtherLowercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0001) != 0;
}
boolean isOtherUppercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0002) != 0;
}
boolean isOtherAlphabetic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0004) != 0;
}
boolean isIdeographic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0010) != 0;
}
boolean isJavaIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) >= 0x00005000);
}
boolean isJavaIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00003000) != 0);
}
boolean isUnicodeIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00007000);
}
boolean isUnicodeIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00001000) != 0);
}
boolean isIdentifierIgnorable(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00001000);
}
int toLowerCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00020000) != 0) {
if ((val & 0x07FC0000) == 0x07FC0000) {
switch(ch) {
// map the offset overflow chars
case 0x212B : mapChar = 0x00E5; break;
// map the titlecase chars with both a 1:M uppercase map
// and a lowercase map
case 0x1F88 : mapChar = 0x1F80; break;
case 0x1F89 : mapChar = 0x1F81; break;
case 0xA77D : mapChar = 0x1D79; break;
case 0xA78D : mapChar = 0x0265; break;
case 0xA7AA : mapChar = 0x0266; break;
// default mapChar is already set, so no
// need to redo it here.
// default : mapChar = ch;
}
}
else {
int offset = val << 5 >> (5+18);
mapChar = ch + offset;
}
}
return mapChar;
}
int toUpperCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) == 0x07FC0000) {
switch(ch) {
// map chars with overflow offsets
case 0x00B5 : mapChar = 0x039C; break;
case 0x017F : mapChar = 0x0053; break;
case 0x2D2D : mapChar = 0x10CD; break;
// ch must have a 1:M case mapping, but we
// can't handle it here. Return ch.
// since mapChar is already set, no need
// to redo it here.
//default : mapChar = ch;
}
}
else {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
}
}
return mapChar;
}
int toTitleCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00008000) != 0) {
// There is a titlecase equivalent. Perform further checks:
if ((val & 0x00010000) == 0) {
// The character does not have an uppercase equivalent, so it must
// already be uppercase; so add 1 to get the titlecase form.
mapChar = ch + 1;
}
else if ((val & 0x00020000) == 0) {
// The character does not have a lowercase equivalent, so it must
// already be lowercase; so subtract 1 to get the titlecase form.
mapChar = ch - 1;
}
// else {
// The character has both an uppercase equivalent and a lowercase
// equivalent, so it must itself be a titlecase form; return it.
// return ch;
//}
}
else if ((val & 0x00010000) != 0) {
// This character has no titlecase equivalent but it does have an
// uppercase equivalent, so use that (subtract the signed case offset).
mapChar = toUpperCase(ch);
}
return mapChar;
}
int digit(int ch, int radix) {
int value = -1;
if (radix >= Character.MIN_RADIX && radix <= Character.MAX_RADIX) {
int val = getProperties(ch);
int kind = val & 0x1F;
if (kind == Character.DECIMAL_DIGIT_NUMBER) {
value = ch + ((val & 0x3E0) >> 5) & 0x1F;
}
else if ((val & 0xC00) == 0x00000C00) {
// Java supradecimal digit
value = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
}
}
return (value < radix) ? value : -1;
}
int getNumericValue(int ch) {
int val = getProperties(ch);
int retval = -1;
switch (val & 0xC00) {
default: // cannot occur
case (0x00000000): // not numeric
retval = -1;
break;
case (0x00000400): // simple numeric
retval = ch + ((val & 0x3E0) >> 5) & 0x1F;
break;
case (0x00000800) : // "strange" numeric
switch (ch) {
case 0x0BF1: retval = 100; break; // TAMIL NUMBER ONE HUNDRED
case 0x0BF2: retval = 1000; break; // TAMIL NUMBER ONE THOUSAND
case 0x1375: retval = 40; break; // ETHIOPIC NUMBER FORTY
case 0x0D71: retval = 100; break; // MALAYALAM NUMBER ONE HUNDRED
case 0x0D72: retval = 1000; break; // MALAYALAM NUMBER ONE THOUSAND
case 0x2186: retval = 50; break; // ROMAN NUMERAL FIFTY EARLY FORM
case 0x2187: retval = 50000; break; // ROMAN NUMERAL FIFTY THOUSAND
case 0x2188: retval = 100000; break; // ROMAN NUMERAL ONE HUNDRED THOUSAND
default: retval = -2; break;
}
break;
case (0x00000C00): // Java supradecimal
retval = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
break;
}
return retval;
}
boolean isWhitespace(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00004000);
}
byte getDirectionality(int ch) {
int val = getProperties(ch);
byte directionality = (byte)((val & 0x78000000) >> 27);
if (directionality == 0xF ) {
switch(ch) {
case 0x202A :
// This is the only char with LRE
directionality = Character.DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING;
break;
case 0x202B :
// This is the only char with RLE
directionality = Character.DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING;
break;
case 0x202C :
// This is the only char with PDF
directionality = Character.DIRECTIONALITY_POP_DIRECTIONAL_FORMAT;
break;
case 0x202D :
// This is the only char with LRO
directionality = Character.DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE;
break;
case 0x202E :
// This is the only char with RLO
directionality = Character.DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE;
break;
default :
directionality = Character.DIRECTIONALITY_UNDEFINED;
break;
}
}
return directionality;
}
boolean isMirrored(int ch) {
int props = getProperties(ch);
return ((props & 0x80000000) != 0);
}
int toUpperCaseEx(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) != 0x07FC0000) {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
}
else {
switch(ch) {
// map overflow characters
case 0x00B5 : mapChar = 0x039C; break;
case 0x017F : mapChar = 0x0053; break;
case 0x2D27 : mapChar = 0x10C7; break;
case 0x2D2D : mapChar = 0x10CD; break;
default : mapChar = Character.ERROR; break;
}
}
}
return mapChar;
}
char[] toUpperCaseCharArray(int ch) {
char[] upperMap = {(char)ch};
int location = findInCharMap(ch);
if (location != -1) {
upperMap = charMap[location][1];
}
return upperMap;
}
/**
* Finds the character in the uppercase mapping table.
*
* @param ch the <code>char</code> to search
* @return the index location ch in the table or -1 if not found
* @since 1.4
*/
int findInCharMap(int ch) {
if (charMap == null || charMap.length == 0) {
return -1;
}
int top, bottom, current;
bottom = 0;
top = charMap.length;
current = top/2;
// invariant: top > current >= bottom && ch >= CharacterData.charMap[bottom][0]
while (top - bottom > 1) {
if (ch >= charMap[current][0][0]) {
bottom = current;
} else {
top = current;
}
current = (top + bottom) / 2;
}
if (ch == charMap[current][0][0]) return current;
else return -1;
}
static final CharacterData00 instance = new CharacterData00();
private CharacterData00() {};
static final char X[] = (
"\000\020\040\060\100\120\140\160\200\220\240\260\300\320\340\360\200\u0100"+
"\u0110\u0120\u0130\u0140\u0150\u0160\u0170\u0170\u0180\u0190\u01A0\u01B0\u01C0"+
"\u02B0\u02B0\u15A0\u15B0\040\u15C0\u15D0\u15E0\u15F0\u1600\u1610").toCharArray();
// The Y table has 5664 entries for a total of 11328 bytes.
static final char Y[] = (
"\000\000\000\000\002\004\006\000\000\000\000\000\000\000\010\004\012\014\016"+
"\020\022\024\026\030\032\032\032\032\032\034\036\040\042\044\044\044\044\044"+
"\224\224\224\362\224\224\224\362\224\u01BC\362\072\u039C\u01D4\u02AE\u02C4"+
"\u0162\u02D8\u01D6\362\362\362\362\u039E\u03A0\u016A\362").toCharArray();
// The A table has 930 entries for a total of 3720 bytes.
static final int A[] = new int[930];
static final String A_DATA =
"\u4800\u100F\u4800\u100F\u4800\u100F\u5800\u400F\u5000\u400F\u5800\u400F\u6000"+
"\u400F\u5000\u400F\u5000\u400F\u5000\u400F\u6000\u400C\u6800\030\u6800\030"+
"\u6800\030\u2800\u601A\u7800\000\u4800\u1010\u6800\031\u6800\033\u7800\000"+
"\u6800\u1010\u6800\u1010\u6800\u1010";
// The B table has 930 entries for a total of 1860 bytes.
static final char B[] = (
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+ "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000").toCharArray();
// In all, the character property tables require 19144 bytes.
static {
charMap = new char[][][] {
{ {'\u00DF'}, {'\u0053', '\u0053', } },
{ {'\u0130'}, {'\u0130', } },
{ {'\u0149'}, {'\u02BC', '\u004E', } },
{ {'\uFB17'}, {'\u0544', '\u053D', } },
};
{ // THIS CODE WAS AUTOMATICALLY CREATED BY GenerateCharacter:
char[] data = A_DATA.toCharArray();
assert (data.length == (930 * 2));
int i = 0, j = 0;
while (i < (930 * 2)) {
int entry = data[i++] << 16;
A[j++] = entry | data[i++];
}
}
}
}
class CharacterDataPrivateUse extends CharacterData {
int getProperties(int ch) {
return 0;
}
int getType(int ch) {
return (ch & 0xFFFE) == 0xFFFE
? Character.UNASSIGNED
: Character.PRIVATE_USE;
}
boolean isJavaIdentifierStart(int ch) {
return false;
}
boolean isJavaIdentifierPart(int ch) {
return false;
}
boolean isUnicodeIdentifierStart(int ch) {
return false;
}
boolean isUnicodeIdentifierPart(int ch) {
return false;
}
boolean isIdentifierIgnorable(int ch) {
return false;
}
int toLowerCase(int ch) {
return ch;
}
int toUpperCase(int ch) {
return ch;
}
int toTitleCase(int ch) {
return ch;
}
int digit(int ch, int radix) {
return -1;
}
int getNumericValue(int ch) {
return -1;
}
boolean isWhitespace(int ch) {
return false;
}
byte getDirectionality(int ch) {
return (ch & 0xFFFE) == 0xFFFE
? Character.DIRECTIONALITY_UNDEFINED
: Character.DIRECTIONALITY_LEFT_TO_RIGHT;
}
boolean isMirrored(int ch) {
return false;
}
static final CharacterData instance = new CharacterDataPrivateUse();
private CharacterDataPrivateUse() {};
}
class CharacterDataUndefined extends CharacterData {
int getProperties(int ch) {
return 0;
}
int getType(int ch) {
return Character.UNASSIGNED;
}
boolean isJavaIdentifierStart(int ch) {
return false;
}
boolean isJavaIdentifierPart(int ch) {
return false;
}
boolean isUnicodeIdentifierStart(int ch) {
return false;
}
boolean isUnicodeIdentifierPart(int ch) {
return false;
}
boolean isIdentifierIgnorable(int ch) {
return false;
}
int toLowerCase(int ch) {
return ch;
}
int toUpperCase(int ch) {
return ch;
}
int toTitleCase(int ch) {
return ch;
}
int digit(int ch, int radix) {
return -1;
}
int getNumericValue(int ch) {
return -1;
}
boolean isWhitespace(int ch) {
return false;
}
byte getDirectionality(int ch) {
return Character.DIRECTIONALITY_UNDEFINED;
}
boolean isMirrored(int ch) {
return false;
}
static final CharacterData instance = new CharacterDataUndefined();
private CharacterDataUndefined() {};
}