Discuss the execution mechanism of parseInt and parseFloat in js from the perspective of ES specifications and engines.

Discuss the execution mechanism of parseInt and parseFloat in js from the perspective of ES specifications and engines.

parseInt()In fact, there are still many "pitfalls" with parseFloat()these two commonly used APIs. Let's sort them out in this article. (This article is more suitable for jser who often deal with numbers or students who are interested in the operation of these two APIs)

(github: https://github.com/MichealWayne , personal blog address: https://blog.michealwayne.cn/ ), for reprinting, please contact [email protected]

Perform inspection

When describing js values ​​​​in the past ( Q3. Tell me the results of the following digital value conversionparseInt ), the operating mechanisms of and are mentioned parseFloat. In fact, these two APIs still have quite a lot of pitfalls, although they are not usually encountered.

First guess the execution of the following and check it yourself:

/* parseInt */
parseInt('123.456.789');
parseInt('+123.456.789');
parseInt('123abc');
parseInt('abc123');
parseInt('1e6');
parseInt('    1    ');
parseInt('');
parseInt('0');
parseInt('0x');
parseInt('0x11');
parseInt(new String('123'));

parseInt('a', 16);
parseInt(123.456, -1);
parseInt(123.456, 0);
parseInt(123.456, 1);
parseInt(123.456, 2);
parseInt(123.456, 40);
parseInt(123.456, 36);
parseInt(1e6);
parseInt(1n);
parseInt();
parseInt(null);
parseInt(false);

// 有几个“超纲”题
parseInt(0.00000001);
parseInt(123.456, -99999999999999999999999999);
parseInt(123.456, 99999999999999999999999999);
parseInt(123.456, 9999999999999999999999999);
parseInt(9999999999999999);
parseInt('11111111111111111');
parseInt('11111111111111111111');
parseInt(1e21);
parseInt('123aef', 12);
parseInt('123', NaN);
parseInt('0xf', NaN);
parseInt('123', Infinity);
parseInt('111', 2 ** 32 + 2.1);
parseInt(Symbol());
parseInt(parseInt);
const objTest1 = {
    
    };
parseInt(objTest1);
objTest1.toString = () => 123;
parseInt(objTest1);

/* parseFloat */
parseFloat('123.456.789');
parseFloat('123abc.456.789');
parseFloat('    123abc    ');
parseFloat('1e6');
parseFloat('+Infinity');
parseFloat(Infinity);
parseFloat(123.456);
parseFloat(1e6);
parseFloat(0.00000001);
parseFloat(0.1 + 0.2);
parseFloat('0x1a');
parseFloat(1n);
parseFloat();
parseFloat(null);
parseFloat(false);

// 有几个“超纲”题
parseFloat(9999999999999999);
parseFloat('11111111111111111');
parseFloat('11111111111111111111');
parseFloat(Symbol());
parseFloat(parseFloat);
const objTest2 = {
    
    };
parseFloat(objTest2);
objTest2.toString = () => 123;
parseFloat(objTest2);

ECMAScript Specification

No matter what the kernel, browser or NodeJs is, it will follow the main specifications of ECMA. Therefore, to think about the execution results listed above, you can focus on understanding the execution description of the ECMA specification.

parseInt()

grammar:

parseInt(string, radix)

Among the parameters:

  • string: The value to be parsed. If the argument is not a string, it is converted to a string. Standard input:
    p-parseInt-input-string

  • radix: Optional , an integer between 2~ 36. Inform parseInt()the function string(for example, 11) is radixthe representation of (for example, 2) base. If radixit does not exist, the number displayed in decimal parseIntwill be returned .string

in addition:

parseInt === Number.parseInt; // => true
ECMAScript (6.0) Specification

parseInt1

Briefly translate the execution steps:

  • 1. Define a variable inputString, which is the string result of input parameter stringexecution ToString(string). ( ToStringIt is an internal abstract operation and is not open to the public. Please see the document or appendix at the end of the article for specific implementation);
  • 2. If an exception occurs during execution, it will be returned ( ReturnIfAbruptthe ReturnIfAbruptexecution is actually quite complicated and involves the specification terms of the ECMA specification, which will not be described in this article);
  • 3. Define a variable S, which is inputStringa substring created, which consists of the first code unit that is not a blank character and all code units after the code unit, that is, the leading spaces are removed, which means the same parseInt('123')effect parseInt(' 123'). If no such unit is found, Sthe empty string ( "");
  • 4. Define variables signas 1;
  • 5. If the variable Sis not empty and Sthe first unit is 0x002D( HYPHEN-MINUS, that is, minus sign), the variable signis changed to -1;
  • 6. If the variable Sis not empty and Sthe first unit is 0x002B( PLUS SIGN, that is, plus sign) or 0x002D(HYPHEN-MINUS, that is, minus sign), remove Sthe first unit, that is, remove '+'the / -sign;
  • 7. Define the variable Ras ToInt32(radix), that is, perform ToInt32 digital conversion on the hex declaration, that is to say parseInt('123', 8), parseInt('123', '8')the effect is the same as;
  • 8. If an exception occurs during execution, return ( ReturnIfAbrupt);
  • 9. Define variables stripPrefixas true;
  • 10. If the variable Ris not equal 0, then:
    • If the variable Ris less than 2 or greater than 36, return directly NaN;
    • If the variable Ris not equal to 16, the variable stripPrefixis changed to false;
  • 11. If the variable Ris equal to 0, the variable Ris changed to 10;
  • 12. If the variable stripPrefixvalue is true, then:
    • If Sthe length of the variable string is not less than 2 and the first two character units are 0xor 0X, then delete these two characters and set the variable Rto 16;
  • 13. Set the variable Z. If the variable Scontains a Rcharacter unit that is not a variable number, Zit is Sa substring of and consists of all code units before the first such character unit, which is parseInt('789', 8)the parseInt('7', 8)same as the effect; otherwise, Zit is S;
  • 14. If the variable Zis empty, return directly NaN;
  • 15. Set the variable mathIntto a mathematical integer value expressed Zin base , using letters and numbers to represent values ​​from 10 to 35 (if it is 10 and contains more than 20 significant digits, it is based on the implementation choice, Section (each significant digit after 20 digits may be replaced by 0) , if is not 2, 4, 8, 10, 16, or 32, then may be an approximation of a dependent implementation of the mathematical integer value represented in base notation by .RA-Za-zRZ
    RmathIntZR
  • 16. If mathIntequal to 0, then:
    • If signequal to -1, return -0;
    • Otherwise return +0;
  • 17.Set the variable numberto mathIntthe Number value;
  • 18.Returnsign * number

It looks a bit complicated, so I drew a flow chart:
p-rule_parseInt

parseInt()Only the leading part of the string can be interpreted as an integer value; it ignores any code units that cannot be interpreted as part of the integer symbol, and gives no indication that any such code units are ignored.

parseFloat()

grammar:

parseFloat(string)

Among the parameters:

  • string: The value to be parsed. If the argument is not a string, it is converted to a string. Standard input:
    p-parseFloat-input-string

in addition:

parseFloat === Number.parseFloat; // => true
ECMAScript (6.0) Specification

Official website description:

p-parseFloat-1

Steps:

  • 1. Define a variable inputString, which is the string result of input parameter stringexecution ToString(string). ;

  • 2. If an exception occurs during execution, return ( ReturnIfAbrupt);

  • 3. Define a variable trimmedString, which is inputStringa substring created, which consists of the first code unit that is not a blank character and all code units after the code unit, that is, the leading spaces are removed, which means the same parseFloat('123.456')effect parseFloat(' 123.456'). If no such unit is found, the edge trimmedStringis an empty string ( "");

  • 4. If any prefix of trimmedStringor does not satisfy the syntax, return ; syntax:trimmedStringStrDecimalLiteralNaN
    StrDecimalLiteral
    p-strDecimalLiteral.jpg

    The document is not very intuitive. I drew a railway map:
    p-strDecimalLiteral_rd

  • 5. Define a variable numberStringthat is trimmedStringthe longest prefix of (possibly trimmedString) itself, numberStringsatisfying StrDecimalLiteralthe syntax of .

  • 6. Define the variable mathFloat, which is numberString( MVmathematical value): derived from the text MV, and then round the value (there is also a 20-bit threshold processing). This step of processing is parseInt()very different. As for the specific MV, it is basically what I learned in college, see the documentation ;

  • 7. If mathFloatequals 0, then:

    • If trimmedStringthe first character is equal to "-", then return -0;
    • Otherwise return +0;
  • 8.The returned mathFloatNumber value;

Also drew a flow chart:
p-rule_parseFloat

parseFloat()Only the leading part of the string can be interpreted as an integer value; it ignores any code units that cannot be interpreted as part of the integer symbol, and gives no indication that any such code units are ignored.

ts

Declaring a file ( lib.es5.d.ts) is simple:

/**
 * Converts a string to an integer.
 * @param string A string to convert into a number.
 * @param radix A value between 2 and 36 that specifies the base of the number in `string`.
 * If this argument is not supplied, strings with a prefix of '0x' are considered hexadecimal.
 * All other strings are considered decimal.
 */
declare function parseInt(string: string, radix?: number): number;

/**
 * Converts a string to a floating-point number.
 * @param string A string that contains a floating-point number.
 */
declare function parseFloat(string: string): number;

Pay attention NaNagain"number"


Kernel implementation

Taking a typical WebKit (depending on v8) as an example, you can see the specific code implementation and single test content of parseInt()and (version: )parseFloattags/9.9.56

parseIntSource code

(Main document: /Source/JavaScriptCore/runtime/ParseInt.h)

Main code:

// 入口,方法定义
ALWAYS_INLINE static double parseInt(StringView s, int radix)
{
    if (s.is8Bit())
        return parseInt(s, s.characters8(), radix);
    return parseInt(s, s.characters16(), radix);
}

// ES5.1 15.1.2.2
template <typename CharType>
ALWAYS_INLINE
static double parseInt(StringView s, const CharType* data, int radix)
{
    // 1. Let inputString be ToString(string).
    // 2. Let S be a newly created substring of inputString consisting of the first character that is not a
    //    StrWhiteSpaceChar and all characters following that character. (In other words, remove leading white
    //    space.) If inputString does not contain any such characters, let S be the empty string.
    int length = s.length();
    int p = 0;
    while (p < length && isStrWhiteSpace(data[p]))
        ++p;

    // 3. Let sign be 1.
    // 4. If S is not empty and the first character of S is a minus sign -, let sign be -1.
    // 5. If S is not empty and the first character of S is a plus sign + or a minus sign -, then remove the first character from S.
    double sign = 1;
    if (p < length) {
        if (data[p] == '+')
            ++p;
        else if (data[p] == '-') {
            sign = -1;
            ++p;
        }
    }

    // 6. Let R = ToInt32(radix).
    // 7. Let stripPrefix be true.
    // 8. If R != 0,then
    //   b. If R != 16, let stripPrefix be false.
    // 9. Else, R == 0
    //   a. LetR = 10.
    // 10. If stripPrefix is true, then
    //   a. If the length of S is at least 2 and the first two characters of S are either ―0x or ―0X,
    //      then remove the first two characters from S and let R = 16.
    // 11. If S contains any character that is not a radix-R digit, then let Z be the substring of S
    //     consisting of all characters before the first such character; otherwise, let Z be S.
    if ((radix == 0 || radix == 16) && length - p >= 2 && data[p] == '0' && (data[p + 1] == 'x' || data[p + 1] == 'X')) {
        radix = 16;
        p += 2;
    } else if (radix == 0)
        radix = 10;

    // 8.a If R < 2 or R > 36, then return NaN.
    if (radix < 2 || radix > 36)
        return PNaN;

    // 13. Let mathInt be the mathematical integer value that is represented by Z in radix-R notation, using the letters
    //     A-Z and a-z for digits with values 10 through 35. (However, if R is 10 and Z contains more than 20 significant
    //     digits, every significant digit after the 20th may be replaced by a 0 digit, at the option of the implementation;
    //     and if R is not 2, 4, 8, 10, 16, or 32, then mathInt may be an implementation-dependent approximation to the
    //     mathematical integer value that is represented by Z in radix-R notation.)
    // 14. Let number be the Number value for mathInt.
    int firstDigitPosition = p;
    bool sawDigit = false;
    double number = 0;
    while (p < length) {
        int digit = parseDigit(data[p], radix);
        if (digit == -1)
            break;
        sawDigit = true;
        number *= radix;
        number += digit;
        ++p;
    }

    // 12. If Z is empty, return NaN.
    if (!sawDigit)
        return PNaN;

    // Alternate code path for certain large numbers.
    if (number >= mantissaOverflowLowerBound) {
        if (radix == 10) {
            size_t parsedLength;
            number = parseDouble(s.substring(firstDigitPosition, p - firstDigitPosition), parsedLength);
        } else if (radix == 2 || radix == 4 || radix == 8 || radix == 16 || radix == 32)
            number = parseIntOverflow(s.substring(firstDigitPosition, p - firstDigitPosition), radix);
    }

    // 15. Return sign x number.
    return sign * number;
}

There are no "showy" operations in the code, and everything from execution to comments is completely in compliance with the specifications.

parseInt unit test

(File: chromium / v8 / v8 / 9.9.56 / . / test / webkit / parseInt-expected.txt)

PASS parseInt('123') is 123
PASS parseInt('123x4') is 123
PASS parseInt('-123') is -123
PASS parseInt('0x123') is 0x123
PASS parseInt('0x123x4') is 0x123
PASS parseInt('-0x123x4') is -0x123
PASS parseInt('-') is Number.NaN
PASS parseInt('0x') is Number.NaN
PASS parseInt('-0x') is Number.NaN
PASS parseInt('123', undefined) is 123
PASS parseInt('123', null) is 123
PASS parseInt('123', 0) is 123
PASS parseInt('123', 10) is 123
PASS parseInt('123', 16) is 0x123
PASS parseInt('0x123', undefined) is 0x123
PASS parseInt('0x123', null) is 0x123
PASS parseInt('0x123', 0) is 0x123
PASS parseInt('0x123', 10) is 0
PASS parseInt('0x123', 16) is 0x123
PASS parseInt(Math.pow(10, 20)) is 100000000000000000000
PASS parseInt(Math.pow(10, 21)) is 1
PASS parseInt(Math.pow(10, -6)) is 0
PASS parseInt(Math.pow(10, -7)) is 1
PASS parseInt(-Math.pow(10, 20)) is -100000000000000000000
PASS parseInt(-Math.pow(10, 21)) is -1
PASS parseInt(-Math.pow(10, -6)) is -0
PASS parseInt(-Math.pow(10, -7)) is -1
PASS parseInt('0') is 0
PASS parseInt('-0') is -0
PASS parseInt(0) is 0
PASS parseInt(-0) is 0
PASS parseInt(2147483647) is 2147483647
PASS parseInt(2147483648) is 2147483648
PASS parseInt('2147483647') is 2147483647
PASS parseInt('2147483648') is 2147483648
PASS state = null; try { parseInt('123', throwingRadix); } catch (e) {} state; is "throwingRadix"
PASS state = null; try { parseInt(throwingString, throwingRadix); } catch (e) {} state; is "throwingString"
parseFloatSource code

(Main document: /Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp)

static double parseFloat(StringView s)
{
    unsigned size = s.length();

    if (size == 1) {
        UChar c = s[0];
        if (isASCIIDigit(c))
            return c - '0';
        return PNaN;
    }

    if (s.is8Bit()) {
        const LChar* data = s.characters8();
        const LChar* end = data + size;

        // Skip leading white space.
        for (; data < end; ++data) {
            if (!isStrWhiteSpace(*data))
                break;
        }

        // Empty string.
        if (data == end)
            return PNaN;

        return jsStrDecimalLiteral(data, end);
    }

    const UChar* data = s.characters16();
    const UChar* end = data + size;

    // Skip leading white space.
    for (; data < end; ++data) {
        if (!isStrWhiteSpace(*data))
            break;
    }

    // Empty string.
    if (data == end)
        return PNaN;

    return jsStrDecimalLiteral(data, end);
}


// See ecma-262 6th 11.8.3
template <typename CharType>
static double jsStrDecimalLiteral(const CharType*& data, const CharType* end)
{
    RELEASE_ASSERT(data < end);

    size_t parsedLength;
    double number = parseDouble(data, end - data, parsedLength);
    if (parsedLength) {
        data += parsedLength;
        return number;
    }

    // Check for [+-]?Infinity
    switch (*data) {
    case 'I':
        if (isInfinity(data, end)) {
            data += SizeOfInfinity;
            return std::numeric_limits<double>::infinity();
        }
        break;

    case '+':
        if (isInfinity(data + 1, end)) {
            data += SizeOfInfinity + 1;
            return std::numeric_limits<double>::infinity();
        }
        break;

    case '-':
        if (isInfinity(data + 1, end)) {
            data += SizeOfInfinity + 1;
            return -std::numeric_limits<double>::infinity();
        }
        break;
    }

    // Not a number.
    return PNaN;
}

In contrast, parseIntthe annotations are more complete.

pressFloat test

(File: chromium / v8 / v8 / 9.9.56 / . / test / webkit / parseFloat-expected.txt)

PASS parseFloat() is NaN
PASS parseFloat('') is NaN
PASS parseFloat(' ') is NaN
PASS parseFloat(' 0') is 0
PASS parseFloat('0 ') is 0
PASS parseFloat('x0') is NaN
PASS parseFloat('0x') is 0
PASS parseFloat(' 1') is 1
PASS parseFloat('1 ') is 1
PASS parseFloat('x1') is NaN
PASS parseFloat('1x') is 1
PASS parseFloat(' 2.3') is 2.3
PASS parseFloat('2.3 ') is 2.3
PASS parseFloat('x2.3') is NaN
PASS parseFloat('2.3x') is 2.3
PASS parseFloat('0x2') is 0
PASS parseFloat('1' + nonASCIINonSpaceCharacter) is 1
PASS parseFloat(nonASCIINonSpaceCharacter + '1') is NaN
PASS parseFloat('1' + illegalUTF16Sequence) is 1
PASS parseFloat(illegalUTF16Sequence + '1') is NaN
PASS parseFloat(tab + '1') is 1
PASS parseFloat(nbsp + '1') is 1
PASS parseFloat(ff + '1') is 1
PASS parseFloat(vt + '1') is 1
PASS parseFloat(cr + '1') is 1
PASS parseFloat(lf + '1') is 1
PASS parseFloat(ls + '1') is 1
PASS parseFloat(ps + '1') is 1
PASS parseFloat(oghamSpaceMark + '1') is 1
PASS parseFloat(mongolianVowelSeparator + '1') is NaN
PASS parseFloat(enQuad + '1') is 1
PASS parseFloat(emQuad + '1') is 1
PASS parseFloat(enSpace + '1') is 1
PASS parseFloat(emSpace + '1') is 1
PASS parseFloat(threePerEmSpace + '1') is 1
PASS parseFloat(fourPerEmSpace + '1') is 1
PASS parseFloat(sixPerEmSpace + '1') is 1
PASS parseFloat(figureSpace + '1') is 1
PASS parseFloat(punctuationSpace + '1') is 1
PASS parseFloat(thinSpace + '1') is 1
PASS parseFloat(hairSpace + '1') is 1
PASS parseFloat(narrowNoBreakSpace + '1') is 1
PASS parseFloat(mediumMathematicalSpace + '1') is 1
PASS parseFloat(ideographicSpace + '1') is 1

at last

From the ECMA specification and the code implementation of typical kernels, we can find that parseFloatthere parseIntare many boundary processes, which is also the main reason for pitfalls.

At this point, you can go back and think about the original implementation problems, and most of them can be explained. As for the "super-class" question, if you are interested, you can take a look at the number and type conversion part of the ECMA specification.


appendix

tostring

p-table12-tostring

MV(mathematical value)

p-mv

parseFloat input string format
Diagram(
  ZeroOrMore('Space'),
  Optional(
    Choice(0,
      '+',
      '-',
    ), 'skip'
  ),
  Choice(1,
    'Infinity',
    Sequence(
      Choice(0,
        Sequence(
          ZeroOrMore('0-9'),
          Optional('.', 'skip'),
          OneOrMore('0-9'),
        ),
      ),
      Optional(
        Sequence(
          Choice(0,
            'e',
            'E',
          ),
          Optional(
            Choice(0,
              '+',
              '-',
            ), 'skip'
          ),
          OneOrMore('0-9'),
        )
      , 'skip')
    )
  ),
  ZeroOrMore('Space'),
)
parseInt input string format
Diagram(
  ZeroOrMore('Space'),
  Optional(
    Choice(0,
      '+',
      '-',
    ), 'skip'
  ),
  ZeroOrMore('0-R'),	// R 为进制最大值
  ZeroOrMore('Space'),
)

If you have suggestions or reprints -> [email protected]

Related Links

Guess you like

Origin blog.csdn.net/qq_24357165/article/details/123084296