A simple and wonderful method to get the length of Chinese string
When writing the form validation class of the framework tonight, it is necessary to judge whether the length of a certain string is within the specified range. Naturally, I thought of the strlen function in PHP.
$str = 'Hello world!';
echo strlen($str); // 输出12
However, in the built-in functions of PHP, both strlen and mb_strlen calculate the length by calculating the number of bytes occupied by the string. In different encoding situations, the number of bytes occupied by Chinese is different. Under GBK/GB2312, Chinese characters occupy 2 bytes, while under UTF-8, Chinese characters occupy 3 bytes.
$str = '你好,世界!';
echo strlen($str); // GBK或GB2312下输出12,UTF-8下输出18
When we judge the length of a string, we often need to judge the number of characters, not the number of bytes occupied by the string, such as this PHP code under UTF-8:
$name = '张耕畅';
$len = strlen($name);
// 输出 FALSE,因为在UTF-8下三个中文占9个字节
if($len >= 3 && $len <= 8){
echo 'TRUE';
}else{
echo 'FALSE';
}
So is there any convenient and practical way to get the length of a Chinese character string? The number of Chinese characters can be calculated by regularization, divided by 2 under the GBK/GB2312 encoding, and divided by 3 under the UTF-8 encoding, and finally add the length of the non-Chinese character string, but this is too much trouble, WordPress There is a more beautiful piece of code in , as follows:
$str = 'Hello,世界!';
preg_match_all('/./us', $str, $match);
echo count($match[0]); // 输出9
The idea is to use regular expressions to split the string into individual characters, and directly use count to calculate the number of matched characters , which is the result we want.