Análisis de código fuente URI

Use URI para verificar la URL, el siguiente código:

import org.apache.commons.validator.routines.UrlValidator;

import java.net.URI;
import java.net.URISyntaxException;

public class UrlUtils {
    
    

    public static void main(String[] args) {
    
    
        String url = "http://www.jiaobuchong.com?name=tom&request={\"hobby\":\"film\"}";
        System.out.println(new UrlValidator().isValid(url));
        isValidUrl(url);
    }

    public static boolean isValidUrl(String url) {
    
    
        try {
    
    
            URI uri = new URI(url);
            return true;
        } catch (URISyntaxException e) {
    
    
            e.printStackTrace();
        }
        return false;
    }
}

Si la url contiene {}"", se lanzará URISyntaxException, no mediante cheque. Mirando el código fuente en el URI, hay una comprobación que es bastante interesante.

Uno, analiza el método checkChar

1. Obtenga el binario del rango de caracteres legales mediante highMask

private static final long L_ALPHA = L_LOWALPHA | L_UPALPHA;
private static final long L_LOWALPHA = 0L;
private static final long L_UPALPHA = 0L;

private static final long H_ALPHA = H_LOWALPHA | H_UPALPHA;
private static final long H_LOWALPHA = highMask('a', 'z');
private static final long H_UPALPHA = highMask('A', 'Z');
checkChar(0, L_ALPHA, H_ALPHA, "scheme name");

El resultado de L_ALPHA y es 0. Ahora comencemos la parte más interesante, veamos qué hace highMask:

    // Compute a high-order mask for the characters
    // between first and last, inclusive
    private static long highMask(char first, char last) {
    
    
        long m = 0;
        // Math.min 表示从 ASCII 的范围里取值
        // Math.max(Math.min(first, 127), 64) 表示在 ASCII 码表[64, 127]之间的字符
        // 减去 64 表示不包括 64 这个字符，相对 64 之间还有多少个个字符
        int f = Math.max(Math.min(first, 127), 64) - 64;
        int l = Math.max(Math.min(last, 127), 64) - 64;
        for (int i = f; i <= l; i++)
            m |= 1L << i;
        return m;
    }

highMask ( 'a', 'z ') m binario:
Inserte la descripción de la imagen aquí
highMask ( 'A', 'la Z') m binario:

H_ALPHA = H_LOWALPHA | fase H_UPALPHA es el resultado:
11111111111111111111111111000000111111111111111111111111110Los resultados después de la fase representada H_UPALPHA a-zA-Zeste rango Caracteres, un rábano tiene un hoyo, el bit de hoyo correspondiente a los datos que no están en este rango es 0.

2. Método de coincidencia

   // Tell whether the given character is permitted by the given mask pair
    private static boolean match(char c, long lowMask, long highMask) {
    
    
        if (c == 0) // 0 doesn't have a slot in the mask. So, it never matches.
            return false;
        if (c < 64)
            return ((1L << c) & lowMask) != 0;
        if (c < 128)
            // 左移 (c - 64) 位，和 highMask 进行 & 运算，如果不等于 0 就表示这个字符是合法的
            return ((1L << (c - 64)) & highMask) != 0;
        return false;
    }

En segundo lugar, analiza el método checkChars

A través del análisis anterior, el principio de URI que juzga si un carácter legal es: convertir el carácter en el rango legal en un número binario, y luego realizar la operación AND entre el carácter entrante y el binario. Si no es igual a 0, significa que el carácter es legal. .

Analicemos este método nuevamente:

checkChars(1, p, L_SCHEME, H_SCHEME, "scheme name");

Mediante cálculos en el código, el resultado de L_SCHEME es:

0 | lowMask('0', '9') | lowMask("+-.")

H_SCHEME:

H_SCHEME = highMask('a', 'z') | highMask('A', 'Z') | 0 | highMask("+-.")

Mirando el código de lowMask, el resultado generado m representa el rango de [0-9] binario:

    // Compute a low-order mask for the characters
    // between first and last, inclusive
    private static long lowMask(char first, char last) {
    
    
        long m = 0;
        int f = Math.max(Math.min(first, 63), 0);
        int l = Math.max(Math.min(last, 63), 0);
        for (int i = f; i <= l; i++)
            m |= 1L << i;
        return m;
    }

Luego, en el método de coincidencia:

    // Tell whether the given character is permitted by the given mask pair
    private static boolean match(char c, long lowMask, long highMask) {
    
    
        if (c == 0) // 0 doesn't have a slot in the mask. So, it never matches.
            return false;
        if (c < 64)
            // 对于小于 64 的字符，和 lowMask 进行与运算，不等于0表示合法的字符
            return ((1L << c) & lowMask) != 0;
        if (c < 128)
            return ((1L << (c - 64)) & highMask) != 0;
        return false;
    }