Scenario : After the product manager inserts a coupon code consisting of a letter (for example: coupon) through the background system, and then searches for the coupon code through the keyword coupon from the background, but cannot find it, and it affects online use.
Suppose the table name: exchange_code.
Assuming column name: code, varchar type
Symptom 1 : Search coupons by field name and other conditions through Sequel Pro, but this record cannot be found.
Equivalent to SQL: SELECT * from exchange_cdoe where code = 'coupon';
Symptom 2 : This record can be found through the contains conditional search.
等价于SQL: SELECT * from exchange_cdoe where code contains 'coupon';
Symptom 3 : Calculate the length of the code through the length function, and find that the length is not equal to the length of the coupon 6.
It seems to be 6 letters, but it shows the above situation.
The team later figured out a way to copy the content inside and convert it into unicode.
It was found that it contained the invisible character unicode \200b.
coupon
http://tool.chinaz.com/tools/unicode.aspx (copy the above line of characters from left to right, go to this URL and try to convert unicode)
Problem recurrence
Use java to reproduce the problem
@Test public void testBlog(){ // The letter m corresponds to unicode u6d String unicode = "\\u6d\\u200b\\u6d"; System.out.println("unicode:"+unicode); String string = unicode2String(unicode) ; System.out.println("string:"+string); } /** * unicode to string */ public static String unicode2String(String unicode) { StringBuffer string = new StringBuffer(); String[] hex = unicode.split("\\\\u"); for (int i = 1; i < hex.length; i++) { // convert out each code point int data = Integer.parseInt(hex[i], 16); // append to string string.append((char) data); } return string.toString(); }
Output result :
unicode:\u6d\u200b\u6d
string:mm
In the output result, there is an invisible \u200b special character between mm, and the similar \u200c has the same effect.
These characters may be generated because "actually the current version of Unicode does not fully use this 16-bit encoding, but reserves a lot of space for special use or future expansion." — https://en.wikipedia.org/ wiki/Unicode.
Problem prevention :
Validate data via regular expressions (for example, only numbers and uppercase and lowercase letters are allowed).
@Test public void testBlog(){ // The letter m corresponds to unicode \\u6d String unicode = "\\u6d\\u200b\\u6d"; System.out.println("unicode:"+unicode); String string = unicode2String(unicode) ; System.out.println("string:"+string); Pattern pattern = Pattern.compile("[a-zA-Z0-9.]+"); System.out.println("pass? "+pattern.matcher(string).matches()); }
Output result :
unicode:\u6d\u200b\u6d
string:mm
pass? false
=== The Invisible Characters of Programmers' Daily Stories ===
Product W: "What's the matter? I just inserted this record, why can't I find it in the background?"
Programmer T: "I'll see what's going on, it was all good before"
Product H: "Why is there a problem with this function module again?"
Programmer Z: "Let's take a look. Check it in the database. Why can't you find it by field in the database?"
So-and-so programmer: "Check by ID."
So-and-so programmer: "Check the length of this field"
So-and-so programmer: "Grandma's, why is it different from what it looks like?"
So-and-so programmer: "Look at the unicode encoding of the string"
So-and-so programmer: "Copy it into Java and convert it to unicode and see"
So-and-so programmer: "How come there are invisible characters in grandma's?"
Programmer T: "Product W, how did you enter invisible characters..."
Product W: "I am operating normally"