[Chinese characters to pinyin in java]

1. Introduction to JPinyin

JPinyin is a Java open source class library for converting Chinese characters to Pinyin. It has made some improvements based on the functions of PinYin4j.

 

【Main features of JPinyin】

1. Accurate and perfect font library;

From the Unicode encoding range of 4E00-9FA5 and 20903 Chinese characters in 3007(〇), JPinyin can convert all Chinese characters except 46 variant characters (variant characters do not have standard pinyin);

2, Pinyin conversion speed is fast;

After testing, it takes about 100 milliseconds for JPinyin to convert 20902 Chinese characters in the Unicode encoding range from 4E00-9FA5.

3. Multi-pinyin format output support;

JPinyin supports a variety of pinyin output formats: with phonetic symbols, without phonetic symbols, numbers indicating phonetic symbols, and pinyin initials output formats;

4. Common polyphonic word recognition;

JPinyin supports the recognition of common polyphonic words, including phrases, idioms, place names, etc.;

5. Simplified and traditional Chinese conversion

 

The implementation principle of JPinyin is to store the new words, phrases, corresponding pinyin and simplified and traditional Chinese characters in the data dictionary dic, and then operate the data dictionary dic through code to realize the conversion of Chinese characters/phrases into pinyin and simplified and traditional Chinese characters. The data dictionary dic all is customizable

 

Core method description:

There are four classes in Jpinyin:

ChineseHelper.Java Chinese character conversion class

PinyinFormat.java Pinyin format class

PinyinHelper.java Chinese character to pinyin class

PinyinResource.java resource file loading class

 

Second, the sample code is as follows:

import com.github.stuxuhai.jpinyin.PinyinFormat;

import com.github.stuxuhai.jpinyin.PinyinHelper;

 

/**

 * <dependency>

       <groupId>com.github.stuxuhai</groupId>

       <artifactId>jpinyin</artifactId>

       <version>1.1.8</version>

    </dependency>

 * @author Administrator

 *

 */

public class JPinDemo {

 

/**

* @param args

* @throws Exception 

*/

public static void main(String[] args) throws Exception {

String str ="DRDS( Distributed Relational Database Service)" +

"It is a distributed database product independently developed by Alibaba to solve the bottleneck problem of stand-alone database services." +

" DRDS is highly compatible with MySQL protocol and syntax, supports automatic horizontal splitting, smooth expansion, elastic expansion, " +

"Transparent read-write separation, distributed transactions, and the ability to manage and control the entire life cycle of distributed databases." +

"DRDS, formerly known as Taobao TDDL, is the preferred component for nearly a thousand core applications";

//with tone

String res = PinyinHelper.convertToPinyinString(str, " ", PinyinFormat.WITH_TONE_MARK);

System.out.println(res);

 

// without tone

res = PinyinHelper.convertToPinyinString(str, " ", PinyinFormat.WITHOUT_TONE);

System.out.println(res);

 

//The tones are converted to numbers: Yinping 1, Yangping 2, that is, there are four tones in Mandarin, usually called four tones,

               //  That is Yinping (the first tone), represented by "ˉ", such as lā;

//The second tone of Yangping, represented by "ˊ", such as lá; the upper tone (third tone),

//Use "ˇ" to express, such as lǎ; to sound (fourth tone), use "ˋ" to express, such as; là.

res = PinyinHelper.convertToPinyinString(str, " ", PinyinFormat.WITH_TONE_NUMBER);

System.out.println(res);

 

}

 

}

3. Verification:



 Note: When the conversion is not allowed or the new dictionary is not included, you can customize it in the dic directory of the data directory. For example, ICBC and CCB are not allowed to convert.

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326968097&siteId=291194637