Processing Chinese strings in PHP to prevent garbled characters

Common functions for calculating the length of strings in php are: strlen and mb_strlen. The following is a comparison of these two functions (encoding method UTF8)

Comparing strlen and mb_strlen
When the characters are all English characters, the two are the same. Here we mainly compare the two calculation results when Chinese and English are mixed. (The encoding method during the test is UTF8)
The code for copying the code is as follows:

<?php
$str=‘中文a字1符‘;
echo strlen($str);
echo ‘<br />‘;
echo mb_strlen($str,‘UTF8‘);
//输出结果
//14
//6
?>

Result analysis: In the calculation of strlen, the length of a UTF8 Chinese character is 3, so the length of "Chinese a character 1 character" is 3*4+2=14
. When mb_strlen is calculated, if the internal code is selected as UTF8, it will be A Chinese character is calculated as a length of 1, so the length of "Chinese a character 1 character" is 6.
Placeholder calculation for mixed Chinese and English strings:
Using these two functions, a Chinese and English mixed string can be jointly calculated. What is the placeholder of the string (the placeholder of a Chinese character is 2, and the placeholder of an English character is 1), the calculation method is: if a shuffled string has a Chinese and b English, the placeholder is:
copy code The code is as follows :

<?php
$str=‘中文a字1符‘;
//计算如下
echo (strlen($str) + mb_strlen($str,‘UTF8‘)) / 2;
echo
//输出结果
//10
?>

For example, strlen( s t r ) value Yes 14 m b s t r l e n ( str) value is 6, then it can be calculated that the placeholder of "Chinese a character 1 character" is 10.
Attach an article on the website:
it is still a question about Chinese. PHP's built-in string length function, strlen, cannot handle Chinese strings correctly, and what it gets is only the number of bytes occupied by the string. For GB2312 Chinese encoding, the value obtained by strlen is twice the number of Chinese characters, while for UTF-8 encoded Chinese, it is three times the difference (under UTF-8 encoding, a Chinese character occupies 3 bytes).

Using the mb_strlen function can better solve this problem. The usage of mb_strlen is similar to strlen, except that it has a second optional parameter to specify the character encoding. For example to get UTF-8 string s t r long Spend Can by use m b s t r l e n ( str,'UTF-8'). If the second parameter is omitted, PHP's internal encoding will be used. The internal encoding can be obtained through the mb_internal_encoding() function. It should be noted that mb_strlen is not a PHP core function. Before using it, make sure that php_mbstring.dll is loaded in php.ini, that is, make sure that the line "extension=php_mbstring.dll" exists and is not commented out, otherwise it will appear undefined function problem.

Realize string reversal
English :
strrev($a)
Chinese or other text:
Chinese: GB2312, the code is using GB2312 encoding

<?php
function reverse($str)
{
    $ret = "";
    $len = mb_strlen($str);
    for($i = 0; $i < $len; $i++)
    {
        $arr[] = mb_substr($str, $i, 1 );
    }

    print_r($arr);
    echo "<hr />";
    return implode("", array_reverse($arr));
}
print_r(reverse("你好"));

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324607604&siteId=291194637