Likou Solution Summary 393- UTF-8

Original title link: force buckle


describe:

Given an array of integers data representing data, return whether it is a valid UTF-8 encoding.

A character in UTF-8 may be 1 to 4 bytes long, following the following rules:

For a 1-byte character, the first bit of the byte is set to 0, and the last 7 bits are the unicode code of the symbol.
For n-byte characters (n > 1), the first n bits of the first byte are set to 1, the n+1th bit is set to 0, and the first two bits of the following bytes are set to 10. The remaining unmentioned binary bits are all the unicode codes of this symbol.
This is how UTF-8 encoding works:

   Char. number range | UTF-8 octet sequence
      (hexadecimal) | (binary)
   --------------------+------------ ---------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800 -0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Note: The input is an array of integers. Only the least significant 8 bits of each integer are used to store data. This means that each integer represents only 1 byte of data.

Example 1:

Input: data = [197,130,1]
Output: true
Explanation: Data represents a sequence of bytes: 11000101 10000010 00000001.
This is valid utf-8 encoding as a 2-byte character followed by a 1-byte character.
Example 2:

Input: data = [235,140,4]
Output: false
Explanation: Data represents a sequence of 8 bits: 11101011 10001100 00000100. The
first 3 bits are all 1, and the 4th bit is 0 to indicate that it is a 3-byte character.
The next byte is a continuation byte starting with 10, which is correct.
But the second continuation byte does not start with 10, so it is illegal.
 

hint:

1 <= data.length <= 2 * 104
0 <= data[i] <= 255


Source: LeetCode
Link: https://leetcode-cn.com/problems/utf-8-validation The
copyright belongs to LeetCode.com. For commercial reprints, please contact the official authorization, and for non-commercial reprints, please indicate the source.

Problem solving ideas:

* Problem-solving ideas: 
* This question is mainly divided into two cases, 
* The first case is when num>0, which means that there is accumulation ahead. Then the first bit must be 1, and 1 is num--. Otherwise return false. 
* In the second case, when num==0, this representation is the first character of UTF-8, and it is necessary to calculate how long the character is, that is, the number of num. num=1 or num>4 are invalid. And the one that subtracts itself, so num--.

Code:

public class Solution393 {

    public boolean validUtf8(int[] data) {
        int i = 0;
        int num = 0;
        while (i < data.length) {
            int value = data[i++];
            boolean b = (value & 0b1000_0000) == 0b1000_0000;
            if (num > 0) {
                if (b) {
                    num--;
                    continue;
                }
                return false;
            }
            if (num == 0) {
                int flag = 0b1000_0000;
                while ((value & flag) == flag) {
                    num++;
                    flag = flag >> 1;
                    if (num > 4) {
                        return false;
                    }
                }
                if (num == 1) {
                    return false;
                }
                num = num > 0 ? num - 1 : 0;
            }
        }
        return num == 0;
    }
}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324084636&siteId=291194637