pytho basic programming: python using the method of determining encoded string chardet

This article describes the method of Example chardet determined using python encoded string. Share to you for your reference. Specific analysis is as follows:

Recent use python grab some online data, encountered problems coding. Headache, summarize the solution used.

view files encoded in the linux vim fileencoding command SET
Python code detection in a strong package chardet, very simple to use. simple installation using the linux pip install chardet

import chardet
f = open('file','r')
fencoding=chardet.detect(f.read())
print fencoding

fencoding output format { 'confidence': 0.96630842899499614, 'encoding': 'GB2312'}, the probability is determined only whether a certain code. More accurate result. Str type input parameters.

After encoding can learn the python str achieved using encoding conversion decode and encode.

Is a general flow str str coding method using the decode decodes it into unicode string type, then the use of specific encoding according to encode the particular coding unicode string type to convert. str unicode in python and belonging to two different types, as follows.

Window default encoding gbk Generally, linux default encoding utf8
concept python programming coding system, coding python, file encoding.

System Code: the default encoding editor to write source code. It represents all of the content within the source files are encoded into binary code stream according to the word way. Stored to disk. View by locale command under linux.

python coding: decoding means provided in the python. If not set, then, python default is ascii decoding mode. If the Chinese do not appear python source code file, then this place is how the set should not be a problem.

Setting method: at the beginning of the source file (it must be the first line): # - -coding: UTF-8- -, provided the source file decoding scheme is UTF-8, or

import sys
reload(sys)
sys.setdefaultencoding('UTF-8')

File encoding: the encoding of the text, under linux vim use set fileencoding view.

The reason the output distortion is not generally encoded to the system decoder manner.

For example print s, s type str, linux system is the system default encoding utf8 encoding, s before the output should be encoded as utf8. If s is gbk coding should thus output. print s.decode ( 'gbk'). encode ( 'utf8') to output Chinese.

Following the same window case, window default encoding is gbk encoding, it must be encoded before gbk s output.

python unicode general processing process type. Thus before encoding can be directly output.

In this paper, we hope that the Python program designed to help
content on more than how many, and finally to recommend a good reputation in the number of public institutions [programmers], there are a lot of old-timers learning skills, learning experience, interview skills, and other workplace experiences to share, the more we carefully prepared the zero-based introductory information, information on actual projects, the timing has to explain the Python programmer technology every day, share some learning methods and the need to pay attention to small detailsHere Insert Picture Description

Released six original articles · won praise 0 · Views 9

Guess you like

Origin blog.csdn.net/chengxun02/article/details/104976468