emoji存储相关问题汇总

版权声明:转载请注明原创地址,否则依法追究法律责任 https://blog.csdn.net/weixin_38964895/article/details/82384696

数据库存储emoji遇到的问题,过滤emoji表情,将在下面一一论述

  • emoji过滤
  • emoji存储常见问题
  • emoji存储问题解决
  • 常见emoji编码

emoji过滤

private static boolean isNotEmojiCharacter(char codePoint)
{
    return (codePoint == 0x0) ||
        (codePoint == 0x9) ||
        (codePoint == 0xA) ||
        (codePoint == 0xD) ||
        ((codePoint >= 0x20) && (codePoint <= 0xD7FF)) ||
        ((codePoint >= 0xE000) && (codePoint <= 0xFFFD)) ||
        ((codePoint >= 0x10000) && (codePoint <= 0x10FFFF));
}

/**
 * 过滤emoji 或者 其他非文字类型的字符
 * @param source
 * @return
 */
public static String filterEmoji(String source)
{
    int len = source.length();
    StringBuilder buf = new StringBuilder(len);
    for (int i = 0; i < len; i++)
    {
        char codePoint = source.charAt(i);
        if (isNotEmojiCharacter(codePoint))
        {
            buf.append(codePoint);
        }
    }
    return buf.toString();

emoji存储常见问题

前台应用抓取微博信息,每天总有几条数据插入不成功。应用日志显示:

java.sql.SQLException: Incorrect string value: '\xF0\x9F\x92\xAA",...' for column 'raw_json' at row 1, 异常:org.springframework.jdbc.UncategorizedSQLException:  

其中raw_json字段显示如下:

 image

由上图可知,插入的字段里面包含emoji表情符。基本可以判定是字符编码的问题。我们的数据库里面使用的是utf8编码,普通的字符串或者表情都是占位3个字节,所以utf8足够用了,但是移动端的表情符号占位是4个字节,普通的utf8就不够用了,为了应对无线互联网的机遇和挑战、避免 emoji 表情符号带来的问题、涉及无线相关的 MySQL 数据库建议都提前采用utf8mb4 字符集,这必须要作为移动互联网行业的一个技术选型的要点。

utf8可能是2或3或4个字节,而MySQL的utf8是3个字节,存放一个emoji是需要4个字节的,自然不够

解决方案

尝试修改此列的字符集:

ALTER TABLE xx_pnl_weibo_usershow  <br>MODIFY ` raw_json` varchar(3000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
alter table xx_pnl_weibo_usershow CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;(修改表和字段的字符集)

改完后测试,插入失败。

修改库级别的字符集

1. 修改my.ini [mysqld] character-set-server=utf8mb4
2. 在Connector/J的连接参数中,不要加characterEncoding参数。 不加这个参数时,默认值就时autodetect。
3. 将已经建好的表也转换成utf8mb4 
   命令:ALTER TABLE `TABLE_NAME` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; (将TABLE_NAME替换成你的表名)
4. 将需要使用emoji的字段设置类型为: 
   命令:ALTER TABLE `TABLE_NAME`MODIFY COLUMN `COLUMN_NAME`  text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

或者用下述方式:

vi /etc/my.cnf 
 
[client]  
default-character-set=utf8mb4  
[mysqld]  
character-set-server = utf8mb4 
  
service mysqld restart

改完后测试,插入成功。查看插入的数据:

root@localhost zx_prod >select id,raw_json,hex(raw_json),length(raw_json) from  xx_pnl_weibo_usershow where id=2;

+----+------------------+----------------------------------+---------------------+
| id | raw_json | hex(raw_json) | length(raw_json) |
+----+------------------+----------------------------------+---------------------+
| 2 |  | F09F9894F09F9882F09F9898F09F9894 | 16 |
+----+------------------+----------------------------------+---------------------+
1 row in set (0.00 sec)

常见emoji编码

NSArray *array = @[@"\uE415",@"\uE056",@"\uE057",@"\uE414",@"\uE405",@"\uE106",@"\uE418",@"\uE417",

                       @"\uE40d",@"\uE40a",@"\uE404",@"\uE105",@"\uE409",@"\uE40e",@"\uE402",@"\uE108",

                       @"\uE403",@"\uE058",@"\uE407",@"\uE401",@"\uE40f",@"\uE40b",@"\uE406",@"\uE413",

                       @"\uE411",@"\uE412",@"\uE410",@"\uE107",@"\uE059",@"\uE416",@"\uE408",@"\uE40c",

                       @"\uE11a",@"\uE10c",@"\uE32c",@"\uE32a",@"\uE32d",@"\uE328",@"\uE32b",@"\uE022",

                       @"\uE023",@"\uE327",@"\uE329",@"\uE32e",@"\uE32f",@"\uE335",@"\uE334",@"\uE021",

                       @"\uE336",@"\uE13c",@"\uE337",@"\uE020",@"\uE330",@"\uE331",@"\uE326",@"\uE03e",

                       @"\uE11d",@"\uE05a",@"\uE00e",@"\uE421",@"\uE420",@"\uE00d",@"\uE010",@"\uE011",

                       @"\uE41e",@"\uE012",@"\uE422",@"\uE22e",@"\uE22f",@"\uE231",@"\uE230",@"\uE427",

                       @"\uE41d",@"\uE00f",@"\uE41f",@"\uE14c",@"\uE201",@"\uE115",@"\uE428",@"\uE51f",

                       @"\uE429",@"\uE424",@"\uE423",@"\uE253",@"\uE426",@"\uE111",@"\uE425",@"\uE31e",

                       @"\uE31f",@"\uE31d",@"\uE001",@"\uE002",@"\uE005",@"\uE004",@"\uE51a",@"\uE519",

                       @"\uE518",@"\uE515",@"\uE516",@"\uE517",@"\uE51b",@"\uE152",@"\uE04e",@"\uE51c",

                       @"\uE51e",@"\uE11c",@"\uE536",@"\uE003",@"\uE41c",@"\uE41b",@"\uE419",@"\uE41a"];

    NSString *labelStr = @"";

    for (NSString *str in array) {

        labelStr = [NSString stringWithFormat:@"%@ %@",labelStr,str];

    }

    self.myLabel.text = labelStr;

image

猜你喜欢

转载自blog.csdn.net/weixin_38964895/article/details/82384696
今日推荐