Article 2021_01_05_stata15 dta encoding problem and system crash on mac

been15

macos 10.15.3

Problem: Unable to Unicode Chinese dta data, system crash when unicode invalid option

Solution: In case of crash caused by damaged value label, distinguish label and content transcode separately.

https://www.statalist.org/forums/forum/general-stata-discussion/general/1488060-stata-15-crashes-when-using-unicode-translate

We have two users reporting the similar issue. Both are caused by the original dataset contains corrupted value label(s). The dataset works fine when the particular value label(s) are not used. But when -unicode translate- or other Stata commands try to use the corrupted value label(s), it will potentially crash Stata. 

To work around it, you may do the following, 
 

Code:

// I use test.dta as an example

// first we would like to save all value labels in test.dta to a do-file mylab.do
use "test.dta", clear
label save  _all using  mylab, replace

// then we drop all value labels from the dataset, and save the new dataset as test2.dta
label drop _all
save test2.dta, replace

// now we can translate test2.dta
clear
unicode encoding set gb18030
unicode translate "test2.dta", invalid(mark) transutf8

// now we translate the mylab.do
clear
unicode encoding set gb18030
unicode translate mylab.do, invalid(ignore)

// now we can re-attach value labels
use "test2.dta", clear
do mylab.do

Now test2.dta contains translated dataset with fixed value labels, Please email Stata tech support at [email protected] with the dataset attached if the problem persists.

 

# It is said that the data after stata14 uses unicode encoding, so in addition to the crash problem above, there are also conversion problems between low and high version data, such as the conversion between stata5-12 and stata13, stata14 and above.

First of all, it is necessary to analyze the file encoding. If the encoding of the original file is determined to be gb18030, you can set it.

gb18030,gb2312,GBK。

cd ...

unicode analyze test.dta

unicode encodinng set gb18030

unicode translate test.dta

https://www.bilibili.com/video/BV1pa4y1j7eF/?spm_id_from=trigger_reload

 

Unresolved problem: When the scientific count value in stata is transferred to r, it will become a character.

Guess you like

Origin blog.csdn.net/weixin_40895857/article/details/112242700