Python解析\u字符串,\x字符串,十六进制字符串

问题描述

s = '\u4f60\u597d'
s = '\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd'
s = '\xc4\xe3\xba\xc3'




\u字符串

编码

import json

print(json.dumps('你好'))  # "\u4f60\u597d"

解码,直接print


s = '\u4f60\u597d'
print(s)  # 你好
print(str(s))  # 你好
print(repr(s))  # '你好'




\x字符串

编码

print('你好'.encode('utf-8'))  # b'\xe4\xbd\xa0\xe5\xa5\xbd'
print('你好'.encode('gbk'))  # b'\xc4\xe3\xba\xc3'

encode('raw_unicode_escape') 将 str 转 bytes,再解码

s = '\xe4\xbd\xa0\xe5\xa5\xbd'
print(s.encode('raw_unicode_escape').decode('utf-8'))  # 你好
s = '\xc4\xe3\xba\xc3'
print(s.encode('raw_unicode_escape').decode('gbk'))  # 你好




十六进制字符串

编码

import base64

print(base64.b16encode('你好'.encode()))  # b'E4BDA0E5A5BD'

解码,调用 base64

import base64

s = 'E4BDA0E5A5BD'
print(base64.b16decode(s))  # b'\xe4\xbd\xa0\xe5\xa5\xbd'
print(base64.b16decode(s).decode())  # 你好

或用 binascii

import binascii

print(binascii.b2a_hex('你好'.encode()))  # b'e4bda0e5a5bd'
print(binascii.a2b_hex('e4bda0e5a5bd').decode())  # 你好




检测类型

pip install chardet

调用 chardet.detect()

import chardet

s = '你好'.encode()
print(chardet.detect(s))  # {'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}

s = b'\u4f60\u597d'
print(chardet.detect(s))  # {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}

s = b'\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd'
print(chardet.detect(s))  # {'encoding': 'utf-8', 'confidence': 0.87625, 'language': ''}

s = b'\xc4\xe3\xba\xc3'
print(chardet.detect(s))  # {'encoding': 'TIS-620', 'confidence': 0.3598212120361634, 'language': 'Thai'}

最后一个出错




参考文献

  1. 标准编码
  2. 编码和解码十六进制数
  3. chardet/chardet: Python 2/3 compatible character encoding detector.

猜你喜欢

转载自blog.csdn.net/lly1122334/article/details/107356215