Solve the problem that MySQL conditionally queries Emoji emoticons but returns multiple pieces of data [including other emoticons] - COLLATION character order selection

1. The problem occurs

    When inputting keywords for searching articles on the APP client, I accidentally entered an emoji emoticon, prompting that an error occurred, and checking the error log information in the background, prompting that there were 2 identical records in the query:

Caused by: org.hibernate.NonUniqueResultException: query did not return a unique result: 2

2. Business logic

    The database has a tb_search_statistic table used to record the user's search records. Every time the client initiates a search, the background business first checks whether there is already a search record for the "keyword" before, if not, inserts a new piece of data, and if it already exists, increases the number of searches by 1; when performing a query operation, An error was reported because two records were returned.

3. Reproduce the problem in Nacat for MySQL

    Sure enough, there are two records, isn't it very strange, obviously the two emoji are completely different. In fact, there is a character "=" in this query statement. This equal sign is related to the character set and character order of MySQL.

    The data stored in MySQL, as long as it is a character type field, will correspond to a character set (character set + encoding) and character sequence (character sorting and comparison), each character set corresponds to one or more character sequences, and corresponds to a The default character order, when creating a new field in the data table, the character set and character order will be determined, if not specified, it will inherit from the character set and character order of the table (inheritance relationship: server <- database <- table <- field).

    Look at the character set and character order of the table on the Navicat client:

    It can be found that the character set of the table is utf8mb4 , and the character order is the default  utf8mb4_general_ci . Because the keyword field is not specified, it inherits the same character set and character order as the table.

    The crux of the problem lies here: utf8mb4_general_ci cannot accurately distinguish different emoji emoticons, so there are multiple records in the query results. Just mentioned that one character sequence can correspond to multiple character sequences. The following is a comparison of the two character sequences corresponding to utf8mb4:

  • utf8mb4_bin : Compile and store each character of the string as binary data, case-sensitive, and can store binary content.
  • utf8mb4_general_ci : ci is case insensitive, case insensitive. The Unicode sorting rules are not implemented, and the sorting results may be inconsistent when encountering some special languages ​​or character sets. However, in the vast majority of cases, the order of these special characters does not need to be that precise.
     

4. Solutions

    Solution 1: Change the character order of the field to utf8mb4_bin

     After the modification, execute the query statement again, and the result is exactly the data we expected:

 

    Solution 2: Add the binary keyword in the where query field. BINARY is not a function, but a type conversion operator , which is used to force the string behind it to be a binary string.

    Some terms about character sets and character order appeared above, in fact, some inexplicable errors of MySQL including "garbled characters" are closely related to them. So it is necessary to have a clear understanding of them. If you are interested, you can refer to another article of the author:

Detailed graphic explanation of MySQL character set concept, principle and configuration

Guess you like

Origin blog.csdn.net/crazestone0614/article/details/132457601