文章2021_01_05_stata15在mac上的dta编码问题和系统crash

stata15

macos 10.15.3

问题:无法unicode中文的dta数据,unicode invalid option时,出现系统crash

解决: 值标签损坏导致的crash, 区分标签和内容分别转码。

https://www.statalist.org/forums/forum/general-stata-discussion/general/1488060-stata-15-crashes-when-using-unicode-translate

We have two users reporting the similar issue. Both are caused by the original dataset contains corrupted value label(s). The dataset works fine when the particular value label(s) are not used. But when -unicode translate- or other Stata commands try to use the corrupted value label(s), it will potentially crash Stata. 

To work around it, you may do the following, 
 

Code:

// I use test.dta as an example

// first we would like to save all value labels in test.dta to a do-file mylab.do
use "test.dta", clear
label save  _all using  mylab, replace

// then we drop all value labels from the dataset, and save the new dataset as test2.dta
label drop _all
save test2.dta, replace

// now we can translate test2.dta
clear
unicode encoding set gb18030
unicode translate "test2.dta", invalid(mark) transutf8

// now we translate the mylab.do
clear
unicode encoding set gb18030
unicode translate mylab.do, invalid(ignore)

// now we can re-attach value labels
use "test2.dta", clear
do mylab.do

Now test2.dta contains translated dataset with fixed value labels, Please email Stata tech support at [email protected] with the dataset attached if the problem persists.

# 据称,stata14后的数据采用了unicode编码,所以除了上面的crash问题,还有低版本和高版本数据的转化问题,如stata5-12和stata13,stata14及以上之间的转化问题。

首先,分析一下文件编码是必要的,如果确定原始文件的编码是gb18030,可以set。

gb18030,gb2312,GBK。

cd ...

unicode analyze test.dta

unicode encodinng set gb18030

unicode translate test.dta

https://www.bilibili.com/video/BV1pa4y1j7eF/?spm_id_from=trigger_reload

未解决的问题:stata中的科学计数数值转到r中时,会变成字符。

猜你喜欢

转载自blog.csdn.net/weixin_40895857/article/details/112242700