Pretreatment of traditional and simplified Chinese pinyin NLP- conversion and acquisition

  In everyday Chinese NLP, often involved in traditional and simplified Chinese and Pinyin conversion labeling and other issues, this article describes the implementation of these two aspects.
  The first is the traditional and simplified Chinese conversion does not require the use of additional Python modules Python code to the following two files to:

  Sample code is as follows (with langconv.py code file in the same directory zh_wiki.py):

from langconv Import * # conversion Traditional to Simplified DEF cht_2_chs (Line): 
    Line = Converter ( ' ZH-Hans ' ) .convert (Line) 
    line.encode ( ' UTF-. 8 ' )
     return Line 
line_cht = ' '' 
Taipei length Ko Wen-je in the face of this open book live Xianxiang users report their own 16 to 24 March to 4 cities in the eastern United States to visit, and then he announced without warning, 
February 23 first visit to Israel, plan to stay for 4-5 days. Although he stressed that Taipei City, has been Israel's exchange in terms of information security, but also to the local exchange of the city, 
visit the content industry innovation, but Ke also said, "is to see a small country in such a harsh environment, howtosurvive, his What's the secret? "remarks, 
also be interpreted quite new heights, the President directed thinking big bits. '' ' 
Line_cht = line_cht.replace ( ' \ n- ' , ' '





) 
Ret_chs = cht_2_chs (line_cht)
 Print (ret_chs) 

# convert Simplified to Traditional 
DEF chs_2_cht (sentence): 
    sentence = Converter ( ' zh-hant ' ) .convert (sentence)
     return sentence 

line_chs = ' melancholy Taiwan tortoise ' 
line_cht = chs_2_cht (line_chs)
 Print (line_cht)

  Output results are as follows:

Taipei Mayor Ko Wen-je now live in the open book face, Xianxiang users report their own 16 to 24 March to 4 cities in the eastern United States to visit, and then he announced without warning, February 23 first visit to Israel, plan to stay 4-5 day. Although he stressed that Taipei City, Israel has been in terms of exchange of information security, but also to the local exchange of the city, visit the content industry innovation, but Ke also said, "is to see a small country in such a harsh environment, howtosurvive, his secret What tactic that? "remarks, also be interpreted quite new heights, the President directed thinking big bits.

  Next is to get Chinese Phonetic in this regard Python module xpinyin, pypinyin and so on. In this paper, xpinyin example, showing how to get pinyin of Chinese characters. Sample code is as follows:

from xpinyin Import Pinyin 

P = Pinyin () 

# default delimiter is - 
Print (p.get_pinyin ( " Shanghai " )) 

# display tone 
Print (p.get_pinyin ( " Shanghai " , tone_marks = ' Marks ' ))
 Print (P. get_pinyin ( " Shanghai " , tone_marks = ' Numbers ' )) 

# removed separator 
Print (p.get_pinyin ( " Shanghai " , ''))
 # Set delimiter is a space
Print (p.get_pinyin ( " Shanghai " , '  ' )) 

# get Pinyin initials 
Print (p.get_initial ( " on " ))
 Print (p.get_initials ( " Shanghai " ))
 Print (p.get_initials ( " Shanghai " , '' ))
 Print (p.get_initials ( " Shanghai " , '  ' ))

  Output:

Shanghai 
Sàdag - Hǎi 
Shang4 - Hai3 
Shanghai 
Shang Hai 
S SH
 
SH 
SH

 

Guess you like

Origin www.cnblogs.com/chen8023miss/p/11446959.html