Understand PinYin4j in one article ---Introduction to Getting Started

I. Overview

pinyin4j is an open source popular java library that supports the conversion between Chinese characters and pinyin. The pinyin output format can be customized to handle the conversion of Chinese into pinyin (Chinese pinyin, Roman pinyin, etc.), and the function is very powerful.

Official website address: http://pinyin4j.sourceforge.net/Online
documentation: http://pinyin4j.sourceforge.net/pinyin4j-doc/

2. Main functions

  • Support multiple pronunciations of the same Chinese character
  • Support the formatted output of pinyin, such as the first few tones and the like
  • Support conversion of Simplified Chinese and Traditional Chinese to Pinyin

3. Code example

1. Dependency configuration

<!-- 导入pinyin4j -->
		<dependency>
			<groupId>com.belerweb</groupId>
			<artifactId>pinyin4j</artifactId>
			<version>2.5.0</version>
		</dependency>

2. Commonly used classes


net.sourceforge.pinyin4j.PinyinHelper;
net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;
net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;
net.sourceforge.pinyin4j.format.HanyuPinyinToneType;
net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;

PinyinHelper: Provides several utility functions for converting Chinese characters (simplified and traditional) to various Chinese romanized representations.
HanyuPinyinOutputFormat: This class defines how to output Chinese Pinyin.
HanyuPinyinCaseType: Provides several options for output cases of HanyuPinyin strings.
HanyuPinyinToneType: This class provides several options for outputting Chinese tones.
HanyuPinyinVCharType: This class provides several options for the output of 'ü'.

3. Conversion code example

// 编写utils类,供转换时直接调用
class Pinyin4jUtils {
    
    

    /**
     * 分别获取中文汉字的第一个首字母
     *
     * @param hanyu
     * @return
     */
    public static String getFirstPinYin(String hanyu) {
    
    
        HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();
        format.setCaseType(HanyuPinyinCaseType.UPPERCASE);
        format.setToneType(HanyuPinyinToneType.WITHOUT_TONE);

        StringBuilder firstPinyin = new StringBuilder();
        char[] hanyuArr = hanyu.trim().toCharArray();
        try {
    
    
            for (int i = 0, len = hanyuArr.length; i < len; i++) {
    
    
                if (Character.toString(hanyuArr[i]).matches("[\\u4E00-\\u9FA5]+")) {
    
    
                    String[] pys = PinyinHelper.toHanyuPinyinStringArray(hanyuArr[i], format);
                    firstPinyin.append(pys[0].charAt(0));
                } else {
    
    
                    firstPinyin.append(hanyuArr[i]);
                }
            }
        } catch (BadHanyuPinyinOutputFormatCombination badHanyuPinyinOutputFormatCombination) {
    
    
            badHanyuPinyinOutputFormatCombination.printStackTrace();
        }
        return firstPinyin.toString();
    }

    /**
     * 将中文汉字转为全拼音,并设置相关属性
     * @param hanzi
     * @return
     */
    public static String getAllPinyin(String hanzi) {
    
    
        //输出格式设置
        HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();
        /**
         * 输出大小写设置
         *
         * LOWERCASE:输出小写
         * UPPERCASE:输出大写
         */
        format.setCaseType(HanyuPinyinCaseType.LOWERCASE);

        /**
         * 输出音标设置
         *
         * WITH_TONE_MARK:直接用音标符(必须设置WITH_U_UNICODE,否则会抛出异常)
         * WITH_TONE_NUMBER:1-4数字表示音标
         * WITHOUT_TONE:没有音标
         */
        format.setToneType(HanyuPinyinToneType.WITH_TONE_MARK);

        /**
         * 特殊音标ü设置
         *
         * WITH_V:用v表示ü
         * WITH_U_AND_COLON:用"u:"表示ü
         * WITH_U_UNICODE:直接用ü
         */
        format.setVCharType(HanyuPinyinVCharType.WITH_U_UNICODE);

        char[] hanYuArr = hanzi.trim().toCharArray();
        StringBuilder pinYin = new StringBuilder();

        try {
    
    
            for (int i = 0, len = hanYuArr.length; i < len; i++) {
    
    
                //匹配是否是汉字
                if (Character.toString(hanYuArr[i]).matches("[\\u4E00-\\u9FA5]+")) {
    
    
                    //如果是多音字,返回多个拼音,这里只取第一个
                    String[] pys = PinyinHelper.toHanyuPinyinStringArray(hanYuArr[i], format);
                    pinYin.append(pys[0]).append(" ");
                } else {
    
    
                    pinYin.append(hanYuArr[i]).append(" ");
                }
            }
        } catch (BadHanyuPinyinOutputFormatCombination badHanyuPinyinOutputFormatCombination) {
    
    
            badHanyuPinyinOutputFormatCombination.printStackTrace();
        }
        return pinYin.toString();
    }


}

Call example:

public class Pinyin4jDemo {
    
    
    public static void main(String[] args) {
    
    
        String str = "中国共产党万岁";
        System.out.println("获取首字母:"+Pinyin4jUtils.getFirstPinYin(str));
        System.out.println("获取全拼音:"+Pinyin4jUtils.getAllPinyin(str));
    }
}

operation result:


获取首字母:ZGGCDWS
获取全拼音:zhōng guó gòng chăn dăng wàn suì 

Guess you like

Origin blog.csdn.net/m0_37899908/article/details/131263874