Android Zxing scan code Chinese garbled solution

Before Zxing3.2.1,  I encountered the problem of Android Zxing scanning garbled codes, and I searched the Internet to solve it. Had the problem again today. Still garbled.

The research is summarized as follows:

Zxing can add default encoding format in Hints. This encoding format is used to interpret Byte data,

1. If no encoding set is specified in the code, this encoding format will be used by default.

2. If a code set is specified, use the character set specified by the code.

It is not mandatory , and there are two main encoding formats for Chinese, GBK and UTF-8.

Note that there are many references to the code set "ISO-8859-1" on the Internet. This code set is a simple code set with one character per byte. It is not a code set that can represent Chinese.

In some cases, its single-byte character characteristics can be used to convert bytes and characters.

But because of the existence of rule 2 above, using this encoding set to do byte acquisition is more or less problematic. (If UTF-8 encoding is specified, the returned result is already utf8, and the bytes obtained by ISO8859 will be garbled, because ISO cannot represent UTF-8 characters, and will be replaced by ?).

Because the code generation is different, it will include whether to specify the encoding format, and the encoding itself may be Utf8 or GBK, and the binary information will be lost during the String conversion process.

Therefore, in terms of thinking, if the original byte array can be obtained, it can be judged that it is displayed in the correct character set. Obtaining through String conversion should be avoided.

The solution is also relatively simple. After looking at the source code, I found that the scanning result actually contains this information.

//Source code QRCodeReader public final Result decode(BinaryBitmap image, Map<DecodeHintType,?> hints) method
//Put some extra information in the Result Metadata. This is a Map

……
result.putMetadata(ResultMetadataType.BYTE_SEGMENTS,byteSegments);
……

This result is the result of scanning the code. In the example, the code scanning result of MipcaActivityCapture will be called back

public void handleDecode(Result result, Bitmap barcode)

byteSegments records the original binary data, just judge the binary data format directly. Pay attention to empty judgment, for example, if it supports scanning barcodes at the same time, there is no Metadata information.

   List<byte[]> byteSegments = (List<byte[]>) result.getResultMetadata().get(ResultMetadataType.BYTE_SEGMENTS);
                StringBuffer buffer1 =new StringBuffer();
                for (int i = 0; i < byteSegments.size(); i++) {
                    byte[] buffer =byteSegments.get(i);
                    String tempStr = "";
                    //猜一下编码格式
                    if(isUtf8(buffer)){
                        tempStr = new String(buffer, "utf-8");
                    }else{
                        tempStr = new String(buffer, "GBK");
                    }

                    buffer1.append(tempStr);
                }
                resultString = buffer1.toString();

isUtf8 is a tool function copied from the Internet, as follows

public static Boolean isUtf8(byte[] buffer) {
        boolean isUtf8 = true;
        int end = buffer.length;
        for (int i = 0; i < end; i++) {
            byte temp = buffer[i];
            if ((temp & 0x80) == 0) {// 0xxxxxxx
                continue;
            } else if ((temp & 0xC0) == 0xC0 && (temp & 0x20) == 0) {// 110xxxxx 10xxxxxx
                if (i + 1 < end && (buffer[i + 1] & 0x80) == 0x80 && (buffer[i + 1] & 0x40) == 0) {
                    i = i + 1;
                    continue;
                }
            } else if ((temp & 0xE0) == 0xE0 && (temp & 0x10) == 0) {// 1110xxxx 10xxxxxx 10xxxxxx
                if (i + 2 < end && (buffer[i + 1] & 0x80) == 0x80 && (buffer[i + 1] & 0x40) == 0
                        && (buffer[i + 2] & 0x80) == 0x80 && (buffer[i + 2] & 0x40) == 0) {
                    i = i + 2;
                    continue;
                }
            } else if ((temp & 0xF0) == 0xF0 && (temp & 0x08) == 0) {// 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
                if (i + 3 < end && (buffer[i + 1] & 0x80) == 0x80 && (buffer[i + 1] & 0x40) == 0
                        && (buffer[i + 2] & 0x80) == 0x80 && (buffer[i + 2] & 0x40) == 0
                        && (buffer[i + 3] & 0x80) == 0x80 && (buffer[i + 3] & 0x40) == 0) {
                    i = i + 3;
                    continue;
                }
            }
            isUtf8 = false;
            break;
        }
        return isUtf8;
    }

problem solved.

 

Guess you like

Origin blog.csdn.net/qq_25148525/article/details/124472345