In everyday Chinese NLP, often involved in traditional and simplified Chinese and Pinyin conversion labeling and other issues, this article describes the implementation of these two aspects.
The first is the traditional and simplified Chinese conversion does not require the use of additional Python modules Python code to the following two files to:
-
langconv.py Address: https://raw.githubusercontent.com/skydark/nstools/master/zhtools/langconv.py
-
zh_wiki.py Address: https://raw.githubusercontent.com/skydark/nstools/master/zhtools/zh_wiki.py
Sample code is as follows (with langconv.py code file in the same directory zh_wiki.py):
from langconv Import * # conversion Traditional to Simplified DEF cht_2_chs (Line): Line = Converter ( ' ZH-Hans ' ) .convert (Line) line.encode ( ' UTF-. 8 ' ) return Line line_cht = ' '' Taipei length Ko Wen-je in the face of this open book live Xianxiang users report their own 16 to 24 March to 4 cities in the eastern United States to visit, and then he announced without warning, February 23 first visit to Israel, plan to stay for 4-5 days. Although he stressed that Taipei City, has been Israel's exchange in terms of information security, but also to the local exchange of the city, visit the content industry innovation, but Ke also said, "is to see a small country in such a harsh environment, howtosurvive, his What's the secret? "remarks, also be interpreted quite new heights, the President directed thinking big bits. '' ' Line_cht = line_cht.replace ( ' \ n- ' , ' ' ) Ret_chs = cht_2_chs (line_cht) Print (ret_chs) # convert Simplified to Traditional DEF chs_2_cht (sentence): sentence = Converter ( ' zh-hant ' ) .convert (sentence) return sentence line_chs = ' melancholy Taiwan tortoise ' line_cht = chs_2_cht (line_chs) Print (line_cht)
Output results are as follows:
Taipei Mayor Ko Wen-je now live in the open book face, Xianxiang users report their own 16 to 24 March to 4 cities in the eastern United States to visit, and then he announced without warning, February 23 first visit to Israel, plan to stay 4-5 day. Although he stressed that Taipei City, Israel has been in terms of exchange of information security, but also to the local exchange of the city, visit the content industry innovation, but Ke also said, "is to see a small country in such a harsh environment, howtosurvive, his secret What tactic that? "remarks, also be interpreted quite new heights, the President directed thinking big bits.
Next is to get Chinese Phonetic in this regard Python module xpinyin, pypinyin and so on. In this paper, xpinyin example, showing how to get pinyin of Chinese characters. Sample code is as follows:
from xpinyin Import Pinyin P = Pinyin () # default delimiter is - Print (p.get_pinyin ( " Shanghai " )) # display tone Print (p.get_pinyin ( " Shanghai " , tone_marks = ' Marks ' )) Print (P. get_pinyin ( " Shanghai " , tone_marks = ' Numbers ' )) # removed separator Print (p.get_pinyin ( " Shanghai " , '')) # Set delimiter is a space Print (p.get_pinyin ( " Shanghai " , ' ' )) # get Pinyin initials Print (p.get_initial ( " on " )) Print (p.get_initials ( " Shanghai " )) Print (p.get_initials ( " Shanghai " , '' )) Print (p.get_initials ( " Shanghai " , ' ' ))
Output:
Shanghai Sàdag - Hǎi Shang4 - Hai3 Shanghai Shang Hai S SH SH SH