python code encoding Format Conversion

I first came to this company, familiar with the environment, the boss began to let me do a migration, modify the code to work, I want to say is, this work is really boring ~ ~, look at the code, to change other people's code, there is a change variable, where a change of the file name ??????, are some no-tech, very complicated matter, but by the way the code migration familiar environment Ye Hao. Pulled so much to talk about today's theme bar - to change the code encoding format, for some reason, you need to code migration from room A to room B, can not visit each other between the two, but for historical reasons leading to the engine room A codes are all utf8 encoding, B room GBK coding is required to see how to solve this.

Coding problems

Let us talk about why there is a coding problem, it took the example above, B side room full database is GBK encoding, thus taken out of the data in the database is GBK, taken out from the data in the database is GBK encoding, to show no distortion at the time, in a case where the database does not convert the extracted data, when it is necessary to set the encoding of the transmitted header GBK, the output file (html, tpl, etc.) must be of GBK see the following chart will point more clearly:

DB (GBK) => php like (but not limited to encoding format if the kanji code file, the file conversion is necessary when coding or gbk characters in the output of gbk) => header (GBK) => html, tpl (GBK)

Or only when there is a way out of the library in the code into utf8 GBK, or more generally utf8 popular spots, less problematic point

DB (GBK) => php like (utf8, and converts the data extracted from the database to utf8) => header (utf8) => html, tpl (utf8)

Just follow the above two specifications encoding format, it will not be garbled situation, at least, the first way I test is no problem, so I guess the second also ok, well, now to write a file encoding conversion format small script:

#!/usr/bin/python
# -*- coding: utf-8 -*-
#Filename:changeEncode.py
import os
import sys
 
def ChangeEncode(file,fromEncode,toEncode):
  try:
    f=open(file)
    s=f.read()
    f.close()
    u=s.decode(fromEncode)
    s=u.encode(toEncode)
    f=open(file,"w");
    f.write(s)
    return 0;
  except:
    return -1;
 
def Do(dirname,fromEncode,toEncode):
  for root,dirs,files in os.walk(dirname):
    for _file in files:
      _file=os.path.join(root,_file)
      if(ChangeEncode(_file,fromEncode,toEncode)!=0):
        print "[转换失败:]"+_file
      else:
        print "[成功:]"+_file
 
def CheckParam(dirname,fromEncode,toEncode):
  encode=["UTF-8","GBK","gbk","utf-8"]
  if(not fromEncode in encode or not toEncode in encode):
    return 2
  if(fromEncode==toEncode):
    return 3
  if(not os.path.isdir(dirname)):
    return 1
  return 0
 
if __name__=="__main__":
  error={1:"第一个参数不是一个有效的文件夹",3:"源编码和目标编码相同",2:"您要转化的编码不再范围之内:UTF-8,GBK"}
  dirname=sys.argv[1]
  fromEncode=sys.argv[2]
  toEncode=sys.argv[3]
  ret=CheckParam(dirname,fromEncode,toEncode)
  if(ret!=0):
    print error[ret]
  else:
    Do(dirname,fromEncode,toEncode)

The script is very simple, very simple to use

./changeEncode.py target_dir fromEncode toEncode
  to note here, the relationship between several common coding:

us-ascii code is a subset of utf-8 encoded, this is obtained from the StackOverflow, which read ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded,

I tried really, the show at the time without Chinese characters encoded as us-ascii, after the addition of characters, into a utf-8.

There is ASNI encoding format, which represents the local encoding format, for example, under the Simplified Chinese operating system, ASNI encoded on behalf of GBK coding, this point also need to pay attention

Another point is that a view file encoding format in linux command is:

file -i *

You can see the encoding format file.

Of course, the above documents may be some special characters, when treatment fails, but the general program files are no problem.
  I write to you, for everyone to recommend a very wide python learning resource gathering, click to enter , there is a senior programmer before learning to share experiences, study notes, there is a chance of business experience, and for everyone to carefully organize a python zero the basis of the actual project data, daily python to you on the latest technology, prospects, learning to leave a message of small details

Published 50 original articles · won praise 34 · views 70000 +

Guess you like

Origin blog.csdn.net/haoxun08/article/details/104909274