Chapter 2 Integer source code analysis of Java packaging

Preface

The first chapter analyzes the type conversion of java in detail, the forced conversion and the implicit conversion between different data types.
Originally, the second chapter should analyze the basic knowledge of object-oriented, but found that this kind of theoretical knowledge is very boring to analyze, and it is not easy to understand. So starting from the second chapter, I will gradually explain some of my knowledge and understanding of object-oriented in the text. If there is something wrong, I hope I can help a lot.
This chapter is based on the JDK1.8 version and analyzes the Java packaging class Integer, because this class is a frequently used class and the foundation of the foundation, so thorough analysis is very necessary.

Object-oriented

JAVA is a world, a virtual world. The world in real life is the real world. No matter what kind of world it is, its composition principle must follow certain laws. In my understanding, JAVA is a code world that highly imitates the real world. The ancients said: Tao is Tao, very Tao. This Tao is a law followed by everything in the real world, the abstraction of everything.
In Tai Chi, Dao produces two instruments, two instruments produce four images, and four images produce eight trigrams. Tao is the beginning of everything. Binary is the beginning of the computer. Object is the beginning of Java.
So on this basis, various species and substances were born. In the real world, there are all kinds of people, all kinds of animals, all kinds of plants, all kinds of buildings. If these are imitated with JAVA, there are different kinds of classes, such as Integer, String, File, Thread, etc...Classes are the beginning of object-oriented.
After the world had these people and things, they began to coordinate and cooperate, creating factories (Factory), generators, mining machines (Mybatis), cities (Spring), and so on. The world slowly started to turn around. Just like a website, it can be used by users. This website is the world you create.
A class is a mold for an object. You can use a human analogy. It has attributes, an external interface, and a method to support the interface.
For example:
attributes (cells, blood, protein, etc.) to the
external interface (hands, eyes, nose, mouth, etc.) to
support the interface (bone, heart, lung, kidney, etc.); different organs support different docking, Or jointly support an external interface. They are many-to-many relationships.

Integer

The class analyzed today, Integer, is also analyzed from these three parts. Gradually resolve from the outer layer to the inner layer. I probably made a diagram and divided the Integer class into several divisions.
Insert picture description here
On the left side of the figure are external APIs and some internal methods, there are thirty or forty. On the left are its attributes.
We will analyze the commonly used Api. The yellow below is rarely used, and it is more complicated. We will study it when we learn more in Japan. The gray one is very simple and basically not used.
So this chapter focuses on analyzing the top four groups of methods, and the corresponding APIs are

第一组:构造器、parseInt();
第二组:valueOf()、IntegerCatch内部类
第三组:toString()、getChars()、stringSize();
第四组:toHexString()、toOctalString()、toBinaryString()、toUnsignedString0、formatUnsignedInt()

The first group: Constructor, parseInt()

There are two constructors for Integer, as follows:

# 1. 以int 类型为入参
public Integer(int value) {
    this.value = value;
}

# 2. 以String 类型为入参
public Integer(String s) throws NumberFormatException {
    this.value = parseInt(s, 10);
}

Among them, taking the int type as the input parameter is very simple, that is, assigning the input parameter value to the value attribute maintained in the Integer object.
And the constructor with String as the input parameter, its core calls the parseInt method, so let's analyze parseInt();

(1) parseInt() method

The function of the parseInt() method is to convert a string into a number.
It has two method overloads. They are
parseInt (String s) and pasenInt (String s, int radix), where s is the object to be converted, and radix is ​​the base number of the conversion, such as radix = 10, which is decimal. The parseInt(String s) method is converted to decimal by default. We focus on analyzing parseInt(String s,int radix).

parseInt(String s,int radix) comment translation

/** 
 * Parses the string argument as a signed integer in the radix
 * specified by the second argument. 
 * 
 * 翻译:将字符串参数作为基数中的有符号整数进行分析由第二个参数指定
 * 
 * The characters in the string
 * must all be digits of the specified radix
 *  (as determined by  whether {@link java.lang.Character#digit(char, int)}  returns a nonnegative value),
 * 
 * 翻译:字符串中的字符必须都是指定基数的数字,
 *      就像 Character类中的 digit(char,int)方法一样,返回一个非负值。
 * 
 *   except that the first character may be an
 * ASCII minus sign {@code '-'} ({@code '\u005Cu002D'}) to
 * indicate a negative value or an ASCII plus sign {@code '+'}
 * ({@code '\u005Cu002B'}) to indicate a positive value. The
 * resulting integer value is returned.
 * 
 * 翻译:第一个字符可能是ASCII减号(-)或者加号(+),都将返回一个整数值。
 * 
 * <p>An exception of type {@code NumberFormatException} is
 * thrown if any of the following situations occurs:
 * 
 * 翻译: 如果发生下列情况之一,将会返回NumberFormatException
 * 
 * <ul>
 * <li>The first argument is {@code null} or is a string of
 * length zero.
 * 
 * 翻译: 第一个参数值是null对象 或者其 string长度为0的情况
 * 
 * <li>The radix is either smaller than
 * {@link java.lang.Character#MIN_RADIX} or
 * larger than {@link java.lang.Character#MAX_RADIX}.
 * 
 * 翻译: radix参数值没有在 Character.MIN_RADIX 到 Character.MAX_RADIX 之间的
 * 
 * <li>Any character of the string is not a digit of the specified
 * radix, except that the first character may be a minus sign
 * {@code '-'} ({@code '\u005Cu002D'}) or plus sign
 * {@code '+'} ({@code '\u005Cu002B'}) provided that the
 * string is longer than length 1.
 * 
 * 翻译:转换对象的每个字符都是一个数字,加号或者减号除外。不能只有一个加号和减号
 * 
 * <li>The value represented by the string is not a value of type
 * {@code int}.
 * </ul>
 *  
 * 翻译:转换对象表达的不是一个int类型的数字。而是其他类型如 char 或者 boolean等等
 *
 * <p>这里举了几个例子:
 * <blockquote><pre>
 * parseInt("0", 10) returns 0
 * parseInt("473", 10) returns 473
 * parseInt("+42", 10) returns 42
 * parseInt("-0", 10) returns 0
 * parseInt("-FF", 16) returns -255
 * parseInt("1100110", 2) returns 102
 * parseInt("2147483647", 10) returns 2147483647
 * parseInt("-2147483648", 10) returns -2147483648
 * parseInt("2147483648", 10) throws a NumberFormatException
 * parseInt("99", 8) throws a NumberFormatException
 * parseInt("Kona", 10) throws a NumberFormatException
 * parseInt("Kona", 27) returns 411787
 * </pre></blockquote>
 *
 * @param      s   the {@code String} containing the integer
 *                  representation to be parsed
 * @param      radix   the radix to be used while parsing {@code s}.
 * @return     the integer represented by the string argument in the
 *             specified radix.
 * @exception  NumberFormatException if the {@code String}
 *             does not contain a parsable {@code int}.
 */

Method analysis

public static int parseInt(String s, int radix)  throws NumberFormatException{
    /*
     * WARNING: This method may be invoked early during VM initialization
     * before IntegerCache is initialized. Care must be taken to not use
     * the valueOf method.
     * 警告: 此方法可以在初始化IntegerCache之前的VM初始化前调用。注意不要使用valueOf() 方法
     */
	
	// 1.  入参判断
    if (s == null) {
        throw new NumberFormatException("null");
    }

    if (radix < Character.MIN_RADIX) {
        throw new NumberFormatException("radix " + radix +
                                        " less than Character.MIN_RADIX");
    }

    if (radix > Character.MAX_RADIX) {
        throw new NumberFormatException("radix " + radix +
                                        " greater than Character.MAX_RADIX");
    }
	// 2. 实现转换
    int result = 0;  //初始化返回参数
    boolean negative = false;   //默认这个字符串是一个正数,而不是负数,如“-234”
    int i = 0, len = s.length();   //  i 表示的真值开始的下标
    int limit = -Integer.MAX_VALUE;
    int multmin;
    int digit;
    
	// (1)  判断字符串的正负,及真值的长度
    if (len > 0) {
        char firstChar = s.charAt(0);
        //这里通过比较第一个字符ASCII码的值,来判断
        if (firstChar < '0') { // Possible leading "+" or "-"
            // 如果为负号,转换标志参数,并界定下限不得低于 Integer能表达的最小值。
            if (firstChar == '-') {
                negative = true;
                limit = Integer.MIN_VALUE;
            } else if (firstChar != '+')
             // 如果不为正或负,证明格式有误,抛出异常
                throw NumberFormatException.forInputString(s);
                
			// 当第一个字符小于0的情况下,肯定有一个符号(无论正负),但不能只有一个符号,因此需要判断字符串的长度
            if (len == 1) // Cannot have lone "+" or "-"
                throw NumberFormatException.forInputString(s);
            
            i++; // i 表示的真值开始的下标,此时字符串第一个字符为符号位,因此真值是从第二个开始,
                // 也就是下标要加1。 如果字符串前面没有符号,则i从0开始就表示真值(此情况发生在正数身上)
        }

//At this point, it can be determined that the first character must be + or -,
//but it is impossible to determine whether the subsequent characters are numbers or other characters. Such as "++23++2".

	  //  (2) 对数值按radix表示的进制基数进行转换。
        multmin = limit / radix;
        while (i < len) {
            // Accumulating negatively avoids surprises near MAX_VALUE
            //  翻译:负数累加避免接近最大值时出现意外
            
			// 这里调用digit(char,int)方法,digit()是个边界值判断,不过边界返回字符数字本身数值,超过边界即返回 -1
			// 这个方法就是判断每个字符是不是数字,并返回相应的值
            digit = Character.digit(s.charAt(i++),radix);
            if (digit < 0) {
                throw NumberFormatException.forInputString(s);
            }
            if (result < multmin) {
                throw NumberFormatException.forInputString(s);
            }
            result *= radix;
            if (result < limit + digit) {
                throw NumberFormatException.forInputString(s);
            }
            result -= digit;
        }
    } else {
		// 就像开头注释所说,如果这个字符串的长度为0,将会抛出异常。如字符串 "".
        throw NumberFormatException.forInputString(s);
    }
    // 根据正负形判断返回正负值
    return negative ? result : -result;
}

(2) Constructor

After analyzing the parseInt() method, you can now clearly understand the constructor.
Two methods of the constructor

# 1. 以int 类型为入参
public Integer(int value) {
    this.value = value;
}

# 2. 以String 类型为入参
public Integer(String s) throws NumberFormatException {
    this.value = parseInt(s, 10);
}

The second set of valueOf() and IntegerCatch inner classes

(1)valueOf()

The function of the valueOf method is the automatic boxing of the packaging class.
It has two overloaded methods:

// int类型入参,自动装箱过程
public static Integer valueOf(int i) {
    if (i >= IntegerCache.low && i <= IntegerCache.high)
        return IntegerCache.cache[i + (-IntegerCache.low)];
    return new Integer(i);
} 
// String类型过程,其调用了parseInt() 方法,先转为int类型,再装箱。
public static Integer valueOf(String s) throws NumberFormatException {
    return Integer.valueOf(parseInt(s, 10));
}

As can be seen from the valueOf method, the IntegerCatch cache memory class is called internally. If it is not within the upper limit and the lower limit of the cache internal class, a new object is created, otherwise the object in the cache class is provided. What is the structure of the cache class? Let's analyze it below.

(2) IntegerCatch inner class

private static class IntegerCache {
    static final int low = -128;    //限定了最低值为 -128
    static final int high;			//最高值未限定,在其初始化过程中有其他作用。	
    static final Integer cache[];	// 维持的Integer数组

	// 内部类加载过程中,初始化Integer数组,
    static {
        // high value may be configured by property
        // 翻译: 最高限定值可以通过配置来决定。
        
        int h = 127;  // 如果没有配置,则默认最高限值为127,否则可以通过配置java.lang.Integer.IntegerCache.high的值来决定
        String integerCacheHighPropValue =
            sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
        if (integerCacheHighPropValue != null) {
            try {
                int i = parseInt(integerCacheHighPropValue);
                i = Math.max(i, 127);  //如果配置的属性值小于127,那么还是采用默认的127作为最高限值
                
                // Maximum array size is Integer.MAX_VALUE
                // 如果配置的值过大,这里限定了最大不得超过Integer能表示的最大值 减去 129
                h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
            } catch( NumberFormatException nfe) {
                // If the property cannot be parsed into an int, ignore it.
            }
        }
        high = h;

	// 最后将最大缓存到最小缓存之间的对象存到Integer数组中
        cache = new Integer[(high - low) + 1];
        int j = low;
        for(int k = 0; k < cache.length; k++)
            cache[k] = new Integer(j++); 	//缓存中的顺序为 从最低值-128开始到最高值。
           

        // range [-128, 127] must be interned (JLS7 5.1.7)
        assert IntegerCache.high >= 127;
    }

    private IntegerCache() {}
}

Therefore, returning to the valueOf() method, if the value passed by the caller is between the maximum cache and the minimum cache, then the corresponding object in IntegerCatch is returned to the caller. Otherwise, create an object again.

The third group: toString(), getChars(), StringSize()

Why divide these three methods into one group, because the toString method has two overloads, let’s look at the first toString method first
The function is to convert the input parameter word according to different bases

(1) toString() method

这个方法中调用了 difits[] 数组。数组中记录了0-9,a-z的小写字符。共36个字符。
为什么要定义这个数组呢? 因为 Character.MIN_RADIX =2,Character.MAX_RADIX = 36.
代表一个入参数值转换进制,从2进制到36进制都是允许的。所以需要提供这么多可能的字符,不同进制取不同的值。

final static char[] digits = {
    '0' , '1' , '2' , '3' , '4' , '5' ,
    '6' , '7' , '8' , '9' , 'a' , 'b' ,
    'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
    'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
    'o' , 'p' , 'q' , 'r' , 's' , 't' ,
    'u' , 'v' , 'w' , 'x' , 'y' , 'z'
};

 public static String toString(int i, int radix) {
 
	//如果超过了进制范围,默认为10进制
    if (radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)
        radix = 10;

    /* Use the faster version */
    if (radix == 10) {
        return toString(i);
    }

    char buf[] = new char[33]; // 创建一个33位长度的空数组
    boolean negative = (i < 0);
    int charPos = 32; //下标从32开始,代表buf[] 的最后一个值开始。

    if (!negative) {
        i = -i;
    }

    while (i <= -radix) {
        buf[charPos--] = digits[-(i % radix)]; // 这里根据转换进制的不同,获取数组中对应的字符。
        										// 对于超过10进制而言的数据,允许小写字母来表示
        										// 最常见的16进制:0x834等等。
        i = i / radix;
    }
    buf[charPos] = digits[-i];

    if (negative) {
        buf[--charPos] = '-';
    }

    return new String(buf, charPos, (33 - charPos));
}

(1)toString()

The second toString() method calls getChars() and StringSize() internally;

public static String toString(int i) {
    if (i == Integer.MIN_VALUE)
        return "-2147483648";
    int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i);
    char[] buf = new char[size];
    getChars(i, size, buf);
    return new String(buf, true);
}

(2) getChars() method

The purpose of this method is to get what character each number represents. After reading this method, I realized that the great gods do everything to improve efficiency! ! Ingenious use of two one-dimensional arrays, sophisticated design, each iteration can identify adjacent two digits, amazing! .
First look at the two two-dimensional arrays defined.

final static char [] DigitTens = {
    '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
    '1', '1', '1', '1', '1', '1', '1', '1', '1', '1',
    '2', '2', '2', '2', '2', '2', '2', '2', '2', '2',
    '3', '3', '3', '3', '3', '3', '3', '3', '3', '3',
    '4', '4', '4', '4', '4', '4', '4', '4', '4', '4',
    '5', '5', '5', '5', '5', '5', '5', '5', '5', '5',
    '6', '6', '6', '6', '6', '6', '6', '6', '6', '6',
    '7', '7', '7', '7', '7', '7', '7', '7', '7', '7',
    '8', '8', '8', '8', '8', '8', '8', '8', '8', '8',
    '9', '9', '9', '9', '9', '9', '9', '9', '9', '9',
    } ;

final static char [] DigitOnes = {
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    } ;

方法体:
	static void getChars(int i, int index, char[] buf) {
    int q, r;
    int charPos = index;
    char sign = 0;

    if (i < 0) {
        sign = '-';
        i = -i;
    }

    // Generate two digits per iteration
    // 每次迭代识别两位数字
    while (i >= 65536) {
        q = i / 100;
    // really: r = i - (q * 100);
        r = i - ((q << 6) + (q << 5) + (q << 2));  // 这里使用位运算,向左移了6位也就是乘以 2^6 = 64,  2^5 =32, 2^2 = 4.
        										   // 整个意思就是 q * 100
        										   // 可能为了提升CPU做乘法运算的效率,因此这样写的。
        										   // 每次循环,r 都为 入参值的最后两位。比如36
        i = q;
        buf [--charPos] = DigitOnes[r];  // 然后 36在这里是第三行,得到3,
        buf [--charPos] = DigitTens[r];  // 然后 36在这里是第三行, 得到6,
    }

    // Fall thru to fast mode for smaller numbers
    // assert(i <= 65536, i);
    for (;;) {
        q = (i * 52429) >>> (16+3);
        r = i - ((q << 3) + (q << 1));  // r = i-(q*10) ...
        buf [--charPos] = digits [r];
        i = q;
        if (i == 0) break;
    }
    if (sign != 0) {
        buf [--charPos] = sign;
    }
}

(3) stringSize() method

This is also a very clever design, worth learning. Judge its length by comparing the maximum value of this bit, and incorporate the length into the subscript. The code and program operation are greatly simplified.

 final static int [] sizeTable = { 9, 99, 999, 9999, 99999, 999999, 9999999,
                                  99999999, 999999999, Integer.MAX_VALUE };

// Requires positive x 
static int stringSize(int x) {
    for (int i=0; ; i++)
        if (x <= sizeTable[i])
            return i+1;

The fourth group: toHexString(), toOctalString(), toBinaryString(), toUnsignedString0, formatUnsignedInt()

Last structure diagram: In
Insert picture description here
this group of methods, the external interfaces are toHexString(), toOctalString(), and toBinaryString(). The function of these interfaces is to convert numbers into hexadecimal, octal, and binary.
Example demonstration

int a  = 16;
  System.out.println(Integer.toBinaryString(a));  // 结果:10000
  System.out.println(Integer.toOctalString(a));  // 结果:20
  System.out.println(Integer.toHexString(a));  	// 结果 : 10

The core implementation of these three external interfaces is the toUnsignedString0() method. The source code is like this.

 //  toBinaryString()方法: 
 public static String toBinaryString(int i) {
    return toUnsignedString0(i, 1);
}
//   toOctalString()方法
public static String toOctalString(int i) {
    return toUnsignedString0(i, 3);
}
//   toHexString()方法
public static String toHexString(int i) {
    return toUnsignedString0(i, 4);
}

Let's analyze the core method toUnsignedString0() in detail;

(1) toUnsignedString0 ()

The function of this method is

 /**
 * Convert the integer to an unsigned number.
 */
private static String toUnsignedString0(int val, int shift) {
    // assert shift > 0 && shift <=5 : "Illegal shift value";
    
    //这里用了Integer.numberOfLeadingZeros(val)方法,这个方法作用是计算传入值转换为二进制后,
    // 真值前面有多少个零. 如: System.out.println(Integer.numberOfLeadingZeros(16));,
    // 16的二进制是10000.Integer占4个字节共计32位,因此前面有27个零。这个结果就是27;
    int mag = Integer.SIZE - Integer.numberOfLeadingZeros(val);
    
    // 这一步暂时没看懂
    int chars = Math.max(((mag + (shift - 1)) / shift), 1);
    char[] buf = new char[chars];

    formatUnsignedInt(val, shift, buf, 0, chars);

    // Use special constructor which takes over "buf".
    return new String(buf, true);
}

to sum up

The first set of functions is to introduce the conversion of strings to numbers.
The second set of methods is to introduce the automatic packing of Integer.
The third group of methods is to introduce the conversion of numbers into different base strings, and the return value is a string.
The fourth group of methods also introduces the conversion of numbers into character strings in different bases, and the return value is a character string. It is a bit duplicated with the third set of methods.

The above is probably the source code analysis of the methods commonly used in the Integer class. This part only occupies a small part of the whole class of Integer, there is a lot to learn. But those that are not common and not commonly used, let's learn more when you need them later!

Guess you like

Origin blog.csdn.net/weixin_43901067/article/details/104467915