##Pandas solves Chinese garbled characters in stata data
Recently, a classmate asked me to process a .dta
data in the format of . After writing the first version, I found that some files contained Chinese characters, and garbled characters appeared in the final csv file. After reviewing the information, change to the following writing.
import pandas as pd
import os
files = os.listdir(".")
for f in files:
if f.endswith(".dta"):
filename = f.split(".")[0]
data = pd.read_stata(f)
for c in data.columns:
if isinstance(data[c][0],str):
data[c] = data[c].str.encode('latin-1').str.decode('gbk')
data.to_csv(filename+".csv",encoding='utf_8_sig') # utf_8_sig是关键,否则导出的csv用编辑器打开虽然是中文,但是excel打开还是乱码
print('[OK] %s'%(filename))