Invisible characters in database

Scenario : After the product manager inserts a coupon code consisting of a letter (for example: coupon) through the background system, and then searches for the coupon code through the keyword coupon from the background, but cannot find it, and it affects online use.

 

Suppose the table name: exchange_code.

Assuming column name: code, varchar type

 

Symptom 1 : Search coupons by field name and other conditions through Sequel Pro, but this record cannot be found.

Equivalent to SQL: SELECT * from exchange_cdoe where code = 'coupon';

 

 

Symptom 2 : This record can be found through the contains conditional search.

等价于SQL: SELECT  * from exchange_cdoe where code contains 'coupon';

 

 

Symptom 3 : Calculate the length of the code through the length function, and find that the length is not equal to the length of the coupon 6.

 

 

It seems to be 6 letters, but it shows the above situation.

 

The team later figured out a way to copy the content inside and convert it into unicode.

It was found that it contained the invisible character unicode \200b.



 

coupon​

http://tool.chinaz.com/tools/unicode.aspx (copy the above line of characters from left to right, go to this URL and try to convert unicode)

 

Problem recurrence

Use java to reproduce the problem

    @Test
    public void testBlog(){
        // The letter m corresponds to unicode u6d
        String unicode = "\\u6d\\u200b\\u6d";
        System.out.println("unicode:"+unicode);
        String string = unicode2String(unicode) ;
        System.out.println("string:"+string);
    }

    /**
     * unicode to string
     */
    public static String unicode2String(String unicode) {

        StringBuffer string = new StringBuffer();

        String[] hex = unicode.split("\\\\u");

        for (int i = 1; i < hex.length; i++) {

            // convert out each code point
            int data = Integer.parseInt(hex[i], 16);

            // append to string
            string.append((char) data);
        }

        return string.toString();
    }

Output result :

unicode:\u6d\u200b\u6d

string:m​m

 

In the output result, there is an invisible \u200b special character between mm, and the similar \u200c has the same effect.

These characters may be generated because "actually the current version of Unicode does not fully use this 16-bit encoding, but reserves a lot of space for special use or future expansion." — https://en.wikipedia.org/ wiki/Unicode.

 

Problem prevention :

Validate data via regular expressions (for example, only numbers and uppercase and lowercase letters are allowed).

    @Test
    public void testBlog(){
        // The letter m corresponds to unicode \\u6d
        String unicode = "\\u6d\\u200b\\u6d";
        System.out.println("unicode:"+unicode);
        String string = unicode2String(unicode) ;
        System.out.println("string:"+string);
        Pattern pattern = Pattern.compile("[a-zA-Z0-9.]+");
        System.out.println("pass? "+pattern.matcher(string).matches());
    }

 Output result :

unicode:\u6d\u200b\u6d

string:m​m

pass? false

 

 

=== The Invisible Characters of Programmers' Daily Stories ===

Product W: "What's the matter? I just inserted this record, why can't I find it in the background?"

Programmer T: "I'll see what's going on, it was all good before"

Product H: "Why is there a problem with this function module again?"

Programmer Z: "Let's take a look. Check it in the database. Why can't you find it by field in the database?"

So-and-so programmer: "Check by ID."

So-and-so programmer: "Check the length of this field"

So-and-so programmer: "Grandma's, why is it different from what it looks like?"

So-and-so programmer: "Look at the unicode encoding of the string"

So-and-so programmer: "Copy it into Java and convert it to unicode and see"

So-and-so programmer: "How come there are invisible characters in grandma's?"

Programmer T: "Product W, how did you enter invisible characters..."

Product W: "I am operating normally"

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326747567&siteId=291194637