异常图标导致转码失败

项目代码中有人使用iconv函数将utf8转成ucs2,但是没有对转换失败的流程做处理,产生现网bug。


了解后发现,iconv_open有个自带功能可能会解决。那就是在目标编码后面追加//IGNORE,可以忽略转换失败的部分。man手册中的解释是这样的:

iconv_t iconv_open(const char *tocode, const char *fromcode);
DESCRIPTION
       The  iconv_open() function allocates a conversion descriptor suitable for converting byte sequences from character encoding fromcode to character
       encoding tocode.

       The values permitted for fromcode and tocode and the supported combinations are system-dependent.  For the GNU C library,  the  permitted  values
       are listed by the iconv --list command, and all combinations of the listed values are supported.  Furthermore the GNU C library and the GNU libi-
       conv library support the following two suffixes:

       //TRANSLIT
              When the string "//TRANSLIT" is appended to tocode, transliteration is activated.  This means that when a character cannot be  represented
              in the target character set, it can be approximated through one or several similarly looking characters.

       //IGNORE
              When  the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently dis-
              carded.

       The resulting conversion descriptor can be used with iconv(3) any number of times.  It remains valid until deallocated using iconv_close(3).

       A conversion descriptor contains a conversion state.  After creation using iconv_open(), the state is in the initial state.  Using iconv(3) modi-
       fies  the  descriptor’s  conversion  state.   (This implies that a conversion descriptor can not be used in multiple threads simultaneously.)  To
       bring the state back to the initial state, use iconv(3) with NULL as inbuf argument.

结果很无奈,异常图标过滤不了,比如火式样的图标。这网站竟然不支持这个图标,服了!

异常图标转成utf8时,占用4个字节,每个字节都在汉字的合法范围内,正则pass

最后使用utf8,汉字部分的编码特点解决:汉字占用的3字节分别为1110xxxx,10xxxxxx,10xxxxxx

猜你喜欢

转载自blog.csdn.net/witto_sdy/article/details/81980809