Java中字符串和数字的相互转换

文章目录

字符串

Java中字符串类型分为可变的字符串和不可变的字符串两种；这里的不可变与可变指的是字符串的对象还是不是同一个，会不会因为字符串对象内容的改变而创建新的对象。
不可变的字符串：String类
可变的字符串：StringBuilder类和StringBuffer类
字符串底层采用存储的是char类型数组。

数字

Java中的数字一般用基本数据类型int表示，采用二进制存储值。

1、字符串转数字

将String类型(字符串)转换为int类型(数字)，我们一般采用如下的方法：

String str="1234";
int s=Integer.parseInt(str);

(将StringBuilder或StringBuffer类型转换成数字只需要调用它们的toString方法返回一个String类型的字符串，然后对返回的字符串进行上述一样的操作。)

StringBuffer st=new StringBuffer("1234");
int a=Integer.parseInt(st.toString());

下面是Integer类的parseInt方法的源码(以下源码全部基于JDK11)

public static int parseInt(String s) throws NumberFormatException {
    
    
    return parseInt(s,10);//调用重载的parseInt方法
}
/**...
 * @param      s   the {@code String} containing the integer
 *                  representation to be parsed
 * @param      radix   the radix to be used while parsing {@code s}.
 * @return     the integer represented by the string argument in the
 *             specified radix.
 * @exception  NumberFormatException if the {@code String}
 *             does not contain a parsable {@code int}.
 */
public static int parseInt(String s, int radix)
            throws NumberFormatException
{
    
    
    /*
     * WARNING: This method may be invoked early during VM initialization
     * before IntegerCache is initialized. Care must be taken to not use
     * the valueOf method.
     */

    if (s == null) {
    
    //字符串为空
        throw new NumberFormatException("null");
    }

    if (radix < Character.MIN_RADIX) {
    
    //判断是否小于最小进制(二进制)
        throw new NumberFormatException("radix " + radix +
                                        " less than Character.MIN_RADIX");
    }

    if (radix > Character.MAX_RADIX) {
    
    //判断是否大于最大进制(三十六进制)
        throw new NumberFormatException("radix " + radix +
                                        " greater than Character.MAX_RADIX");
    }

    boolean negative = false;//标志位，用于判断是否为负数
    int i = 0, len = s.length();
    int limit = -Integer.MAX_VALUE;//-2147483647 默认取最大整数的取反值

    if (len > 0) {
    
    
        char firstChar = s.charAt(0);
        if (firstChar < '0') {
    
     // Possible leading "+" or "-"
            if (firstChar == '-') {
    
    //如果是负数
                negative = true;
                limit = Integer.MIN_VALUE;
            } else if (firstChar != '+') {
    
    
                throw NumberFormatException.forInputString(s);
            }

            if (len == 1) {
    
     // Cannot have lone "+" or "-"
                throw NumberFormatException.forInputString(s);
            }
            i++;
        }
        int multmin = limit / radix;//用于在添加下一位数字的前判断是否溢出的值
        int result = 0;//存储转换后的结果
   
        /*例如一个字符串"123"
          第一次循环结果:result=0*10-1=-1
          第二次循环结果:result=-1*10-2=-12
          第三次循环结果:result=-12*10-3=-123
        */
        while (i < len) {
    
    
            // Accumulating negatively avoids surprises near MAX_VALUE
            int digit = Character.digit(s.charAt(i++), radix);//获取字符并转换成对应进制的整数
            if (digit < 0 || result < multmin) {
    
    //判断 转换后的整数是否小于0||result是否溢出(-2147483640<-214748364)
                throw NumberFormatException.forInputString(s);
            }
            result *= radix;//进位
            if (result < limit + digit) {
    
    //判断是否溢出
                throw NumberFormatException.forInputString(s);
            }
            result -= digit;
        }
        return negative ? result : -result;
    } else {
    
    
        throw NumberFormatException.forInputString(s);
    }
}

总的来说，parseInt首先判断第一个字符是否为"+“或”-"，以确定表示的是正数还是负数。之后通过遍历字符串的每一位将其转换成对应的整数，然后根据所在的位数乘以相应进制的对应平方并相减(因为int类型的取值为-2^31~ 2^31-1，负数Integer.MIN_VALUE的绝对值比较大，而且全部基于负数处理在算法上比较统一)。最后，若是正数，将得到的值取负。

下面来看看具体是如何通过Character的digit方法获取字符所对应的整数。
以下是Character中的digit方法，可以看到返回值为CharacterData.of(codePoint).digit(codePoint, radix);

public static int digit(int codePoint, int radix) {
    
    
    return CharacterData.of(codePoint).digit(codePoint, radix);
}

首先我们来看看CharacterDate的of方法

// Character <= 0xff (basic latin) is handled by internal fast-path
// to avoid initializing large tables.
// Note: performance of this "fast-path" code may be sub-optimal
// in negative cases for some accessors due to complicated ranges.
// Should revisit after optimization of table initialization.
static final CharacterData of(int ch) {
    
    
    if (ch >>> 8 == 0) {
    
         // fast-path
        return CharacterDataLatin1.instance;
    } else {
    
    
        switch(ch >>> 16) {
    
      //plane 00-16
        case(0):
            return CharacterData00.instance;
        case(1):
            return CharacterData01.instance;
        case(2):
            return CharacterData02.instance;
        case(14):
            return CharacterData0E.instance;
        case(15):   // Private Use
        case(16):   // Private Use
            return CharacterDataPrivateUse.instance;
        default:
            return CharacterDataUndefined.instance;
        }
    }
}

char的值，如果在255以内，就调用CharacterDataLatin1来处理,再多一位二进制数字用CharacterData00,再多就CharacterData01、CharacterData02…。
那么，要如何判断我们转换的字符的字节长度呢？
可以使用如下的方法：
String.getBytes(“”UTF-8").length：返回UTF-8编码集下字符串的字节长度，UTF-8不填则为默认编码集；
测试：

    System.out.println("0".getBytes().length);//1
    System.out.println("1".getBytes().length);//1
    System.out.println("2".getBytes().length);//1
    System.out.println("01".getBytes().length);//2
    System.out.println("012".getBytes().length);//3
    System.out.println("我".getBytes().length);//3
    System.out.println("我是".getBytes().length);//6

可以看出一个’0’、‘1’这样表示数字的字符占一个字节。(获取时编码集为默认编码集(UTF-8)，仅适用于UTF-8的情况下)
因此，可以得知上述CharacterDate的of方法中对传入的字符在默认编码集(UTF-8)下会返回CharacterDataLatin1.instance，既CharacterDataLatin1的实例对象。
接下来是调用CharacterDataLatin1的digit(codePoint, radix)对字符进行转换。
以下是源码：

  // Digit values for codePoints in the 0-255 range. Contents generated using:
    // for (char i = 0; i < 256; i++) {
    
    
    //     int v = -1;
    //     if (i >= '0' && i <= '9') { v = i - '0'; } 
    //     else if (i >= 'A' && i <= 'Z') { v = i - 'A' + 10; }
    //     else if (i >= 'a' && i <= 'z') { v = i - 'a' + 10; }
    //     if (i % 20 == 0) System.out.println();
    //     System.out.printf("%2d, ", v);
    // }
    //
    // Analysis has shown that generating the whole array allows the JIT to generate
    // better code compared to a slimmed down array, such as one cutting off after 'z'
    private static final byte[] DIGITS = new byte[] {
    
    
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, -1, -1,
        -1, -1, -1, -1, -1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
        25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, -1, -1, -1, -1, -1, -1, 10, 11, 12,
        13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
        33, 34, 35, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 };

    int digit(int ch, int radix) {
    
    
        int value = DIGITS[ch];
        return (value >= 0 && value < radix && radix >= Character.MIN_RADIX
                && radix <= Character.MAX_RADIX) ? value : -1;
    }

原理很简单，创建一个int数组，通过传入字符的ASCII码来确定数组在下标中的位置并将该位置的值设为转换后的数字。
Integer.parseInt方法是一个一个字符进行传入的，所以传入的字符应是’0’、‘1’、‘2’、‘3’、‘4’、‘5’、‘6’、‘7’、‘8’、‘9’。对应的ASCII是48~57。对应上述DIGITS数组中的DIGITS[48]到DIGITS[57]的位置。
以上便是字符串转换成数字的全部过程。(不同编码集获取字符对应的整数过程可能不同)

2、数字转字符串

数字转字符串主要有三种方式

int a=21;
String b=String.valueOf(a);
String c=Integer.toString(a);
String d=""+a;

其中第一种String的valueOf方法其实也是调用Integer的toString方法进行转换

public static String valueOf(int i) {
    
    
    return Integer.toString(i);
}

下面是Integer的toString方法源码：

@HotSpotIntrinsicCandidate
public static String toString(int i) {
    
    
    int size = stringSize(i);//得到传入整数i的长度
    if (COMPACT_STRINGS) {
    
    
        byte[] buf = new byte[size];
        getChars(i, size, buf);//得到整数i中的每一个字符并存入buf数组中
        return new String(buf, LATIN1);//调用String的构造方法通过传入byte数组生成字符串
    } else {
    
    
        byte[] buf = new byte[size * 2];
        StringUTF16.getChars(i, size, buf);
        return new String(buf, UTF16);
    }
}

COMPACT_STRINGS是一个boolean类型的变量，默认为true，表示默认的编码集是否为UTF-8

static final boolean COMPACT_STRINGS;
static {
    
    
        COMPACT_STRINGS = true;
    }

然后来详细的看看获取长度的方法stringSize:

/*例子:
  如传入的数x为123
  先判断x>=0,x取反，x=-123
  然后for循环中第一次循环 p=-10,x=-123<-10
  第二次p=-100,x=-123<-100
  第三次p=-1000,x=-123>-1000 返回i+d=3+0=3
  
  如传入的数为-123
  直接进行for循环，第一次循环 p=-10,x=-123<-10
  第二次p=-100,x=-123<-100
  第三次p=-123>-1000 返回i+d=3+1=4
*/
static int stringSize(int x) {
    
    
    int d = 1;//默认为1，表示负数比正数前面多了一个'-'
    if (x >= 0) {
    
    //若为正数，取反
        d = 0;
        x = -x;
    }
    int p = -10;//用于比较以确定位数
    for (int i = 1; i < 10; i++) {
    
    //i：0~9
        if (x > p)//如果x比p大说明
            return i + d;
        p = 10 * p;
    }
    return 10 + d;//返回10位+d，int的最大值就10位
}

然后是获取整数对应字符并存入byte数组的getChars方法

/**
 * Places characters representing the integer i into the
 * character array buf. The characters are placed into
 * the buffer backwards starting with the least significant
 * digit at the specified index (exclusive), and working
 * backwards from there.
 *
 * @implNote This method converts positive inputs into negative
 * values, to cover the Integer.MIN_VALUE case. Converting otherwise
 * (negative to positive) will expose -Integer.MIN_VALUE that overflows
 * integer.
 *
 * @param i     value to convert
 * @param index next index, after the least significant digit
 * @param buf   target buffer, Latin1-encoded
 * @return index of the most significant digit or minus sign, if present
 */
static int getChars(int i, int index, byte[] buf) {
    
    
    int q, r;
    int charPos = index;

    boolean negative = i < 0;
    if (!negative) {
    
    
        i = -i;
    }

    // Generate two digits per iteration(每次迭代生成两位数)
    /*例：-123456
      第一次循环:i=-123456 q=i/100=-123456/100=-1234  r=-123400+123456=56 
      第二次循环:i=-1234   q=i/100=-12   r=-1200+1234=34 (此时i=q=-12，结束循环)
    */
    while (i <= -100) {
    
    
        q = i / 100;//除100，取余
        r = (q * 100) - i;//获取末尾两位数(为正数)
        i = q;//下一次循环的i值
        buf[--charPos] = DigitOnes[r];//给r的个位赋值
        buf[--charPos] = DigitTens[r];//给r的十位赋值
    }

    // We know there are at most two digits left at this point.（我们知道现在最多只剩两位数了）
    q = i / 10;
    r = (q * 10) - i;
    buf[--charPos] = (byte)('0' + r);

    // Whatever left is the remaining digit.(剩下的就是剩下的数字。)
    if (q < 0) {
    
    
        buf[--charPos] = (byte)('0' - q);
    }

    if (negative) {
    
    //负数在开头加上'-'
        buf[--charPos] = (byte)'-';
    }
    return charPos;
}
static final byte[] DigitTens = {
    
    
        '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
        '1', '1', '1', '1', '1', '1', '1', '1', '1', '1',
        '2', '2', '2', '2', '2', '2', '2', '2', '2', '2',
        '3', '3', '3', '3', '3', '3', '3', '3', '3', '3',
        '4', '4', '4', '4', '4', '4', '4', '4', '4', '4',
        '5', '5', '5', '5', '5', '5', '5', '5', '5', '5',
        '6', '6', '6', '6', '6', '6', '6', '6', '6', '6',
        '7', '7', '7', '7', '7', '7', '7', '7', '7', '7',
        '8', '8', '8', '8', '8', '8', '8', '8', '8', '8',
        '9', '9', '9', '9', '9', '9', '9', '9', '9', '9',
} ;

static final byte[] DigitOnes = {
    
    
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
} ;

以上便是用Integer的toString方法转换数字为字符串的全过程了。
容易看出不论是字符串转数字，还是数字转字符串，都是根据ASCII码表进行的转换。