A simple and wonderful method to get the length of Chinese string

A simple and wonderful method to get the length of Chinese string

When writing the form validation class of the framework tonight, it is necessary to judge whether the length of a certain string is within the specified range. Naturally, I thought of the strlen function in PHP.

$str = 'Hello world!';
echo strlen($str);      // 输出12

However, in the built-in functions of PHP, both strlen and mb_strlen calculate the length by calculating the number of bytes occupied by the string. In different encoding situations, the number of bytes occupied by Chinese is different. Under GBK/GB2312, Chinese characters occupy 2 bytes, while under UTF-8, Chinese characters occupy 3 bytes.

$str = '你好,世界!';
echo strlen($str);      // GBK或GB2312下输出12,UTF-8下输出18

When we judge the length of a string, we often need to judge the number of characters, not the number of bytes occupied by the string, such as this PHP code under UTF-8:

$name = '张耕畅';
$len = strlen($name);
// 输出 FALSE,因为在UTF-8下三个中文占9个字节
if($len >= 3 && $len <= 8){
    
    
  echo 'TRUE';
}else{
    
    
  echo 'FALSE';
}

So is there any convenient and practical way to get the length of a Chinese character string? The number of Chinese characters can be calculated by regularization, divided by 2 under the GBK/GB2312 encoding, and divided by 3 under the UTF-8 encoding, and finally add the length of the non-Chinese character string, but this is too much trouble, WordPress There is a more beautiful piece of code in , as follows:

$str = 'Hello,世界!';
preg_match_all('/./us', $str, $match);
echo count($match[0]);  // 输出9

The idea is to use regular expressions to split the string into individual characters, and directly use count to calculate the number of matched characters , which is the result we want.

Guess you like

Origin blog.csdn.net/heshihu2019/article/details/132142277