SIM800L processing module Chinese SMS content

When SIM800L module is operating in text mode (AT + CMGF = 1), using the AT + CMGR = 1 non-Chinese will directly return messages read content, Chinese text is displayed hexadecimal value, such as:

CMGL +: 1, "REC UNREAD", "10655000531001147525", "", "20/03/15, 16: 01: 31 + 32" 
30104F174FE160A054C965C56E3830115C0A656C76844F1A5458FF0C60A876849A8C8BC17801662FFF1A003100370031003900370038FF0C67096548671F003100305206949FFF0C8BF75728987597624E2D586B51996B649A8C8BC17801FF0C8FDB884C540E7EED64CD4F5C3002 

// message reads: [UTS] leisurely tour Dear member, you the codes are: 171 978, is valid for 10 minutes, please fill in this verification code page, subsequent operations.

  

When AT + CSDH = 1, the returns more detailed information:

+CMGR: <stat>,<oa>[,<alpha>],<scts>[,<tooa>,<fo>,<pid>,<dcs>,<sca>,<tosca>,<length>]<CR><LF><data>

When the AT + CSDH = 0 when (default value module), only return:

+CMGR: <stat>,<oa><CR><LF><data>

  

STM32 MCU when the content of the message sent to our back-end server through SIM800L, if it is the Chinese message needs to be converted into a hexadecimal value of Chinese string.

How to convert in JAVA? According to [ Reference 1 ], we learned, can parseHexBinary method DatatypeConverter class with:

String input = "30104F174FE160A054C965C56E383011";
byte[] bytes = DatatypeConverter.parseHexBinary(input);
String result = new String(bytes);
System.out.println(result);

  

Then I tested and found garbled. . .

View of the String class constructor [ Reference 2 ] found that you can enter a byte array and decode a specified character set, and then I tried the "UTF-8", "GBK ", will not work, is still garbled.

Then I went to check the Charset parameter in the constructor String [ Reference 3 ], found Charset class in these types of standard character sets:

Charset    Description

US-ASCII	   Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1  	ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8	    Eight-bit UCS Transformation Format
UTF-16BE	  Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE	  Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16	    Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

  

And then wanted, and how to determine the hexadecimal value SIM800L module returns with what kind of coding it?

Thrust reversers, and I generate an array of bytes of the six coding with Chinese string, and then convert the byte array to a string.

Found that the use is "UTF-16BE", then I will be in front of the code to specify the character set after the test was successful:

String input = "30104F174FE160A054C965C56E383011";
byte[] bytes = DatatypeConverter.parseHexBinary(input);
String result = new String(bytes,"UTF-16BE");
System.out.println(result);

  

Then I wanted to see what the relationship between the three of them are: UTF-16BE UTF-16LE UTF -16, according to [ Reference 4 ]

UTF-16BE: utf-16 big-endian big endian, also known as big-endian

UTF-16LE: utf-16 little-endian little endian, also known as little-endian

UTF-16: The default based on different platforms using the above two-endian, such as Apple Mac system UTF-16 = UTF-16BE; Windows and Linux UTF-16 = UTF-16LE.

Figure on two intuitive [ Source ]

 

See the relationship UTF-16 and UCS-2 is to think of it, SIM800 is the use of Chinese encoding of UCS-2

UTF-16 UCS-2 can be regarded as a superset . No auxiliary plane character before (surrogate code points), UTF- 16 and UCS-2 refers to the same meaning. But when character introduction auxiliary plane, called the UTF-16. Now if the software claims to support the UCS-2 encoding, that is actually implies that it can not support the UTF-16 character set in more than 2 bytes. For less than the UCS code 0x10000, UTF-16 encoding is equivalent UCS code.

Then I go back and test it, "UTF-16" charset also found OK

 

 

According to [ here ] of information on, Java platform using the default UTF-16 big endian, so specify UTF-16 character set can be successfully decoded.

 

 

 

 

Reference 1: https://www.baeldung.com/java-base64-encode-and-decode

Reference 2: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html

3 can_kao_zi_liao: httpsdocsoraclecomjavase7docsapijavaniocharsetCharsethtml

Reference 4: https://zh.wikipedia.org/wiki/UTF-16

Reference 5: https://zh.wikipedia.org/wiki/%E5%AD%97%E8%8A%82%E5%BA%8F#%E5%A4%A7%E7%AB%AF%E5% BA% 8F

Reference 6: https://xiaogd.net/%E7%BD%91%E9%A1%B5%E4%B8%AD%E7%9A%84%E7%BC%96%E7%A0%81%E4 % B8% 8E% E4% B9 % B1% E7% A0% 81% EF% BC% 884% EF% BC% 89 /

Guess you like

Origin www.cnblogs.com/1x11/p/12507175.html