Pandas solves Chinese garbled characters in stata data

##Pandas solves Chinese garbled characters in stata data

Recently, a classmate asked me to process a .dtadata in the format of . After writing the first version, I found that some files contained Chinese characters, and garbled characters appeared in the final csv file. After reviewing the information, change to the following writing.

import pandas as pd
import os

files = os.listdir(".")
for f in files:
    if f.endswith(".dta"):
        filename = f.split(".")[0]
        data = pd.read_stata(f)
        for c in data.columns:
            if isinstance(data[c][0],str):
                data[c] = data[c].str.encode('latin-1').str.decode('gbk')
        data.to_csv(filename+".csv",encoding='utf_8_sig') #  utf_8_sig是关键,否则导出的csv用编辑器打开虽然是中文,但是excel打开还是乱码
        print('[OK] %s'%(filename))

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324899319&siteId=291194637