Java converts Chinese characters to pinyin and judges whether a character is a Chinese character

The first is to convert Chinese characters to pinyin:
import dependencies:

 <dependency>
     <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml</artifactId>
      <version>4.1.2</version>
  </dependency>

Create a static method toPinyin for pinyin conversion, which can also be encapsulated into a tool class and called using the tool class:

public static String toPinyin(String chinese){
    
    
        String pinyinStr = "";
        char[] newChar = chinese.toCharArray();
        HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();
        defaultFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);
        defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);
        for (int i = 0; i < newChar.length; i++) {
    
    
            if (newChar[i] > 128) {
    
    
                try {
    
    
                    pinyinStr += PinyinHelper.toHanyuPinyinStringArray(newChar[i], defaultFormat)[0];
                } catch (BadHanyuPinyinOutputFormatCombination e) {
    
    
                    e.printStackTrace();
                }
            }else{
    
    
                pinyinStr += newChar[i];
            }
        }
        return pinyinStr;
    }

insert image description here
The call in the method is successful. Here I need to capitalize and keep spaces between each Chinese character. The operation is as follows:

  String name="西青果颗粒藏青果颗";
        String[] split = name.split("");
        String finalPy="";
        for (String s : split) {
    
    
            String s1 = toPinyin(s).toUpperCase(Locale.ROOT);
            finalPy= finalPy+" "+s1;
        }
        System.out.println(finalPy.trim());

insert image description here

Everything is fine here, but if the string contains non-English special symbols, an error will be reported as follows:
insert image description here
Here you need to replace various symbols in Chinese with those in English:
use the replaceAll method of String to replace, As shown in the picture: insert image description here
Later, various Roman symbols appeared. This method is too troublesome. You can judge whether each character is a Chinese character. If it is a Chinese character, call the toPinyin method to convert it. If it is not a Chinese character, it will not be converted. There are many ways to achieve it. Here Use one: judge by unicode encoding range:

Pattern p2 = Pattern.compile("[\u4e00-\u9fa5]");
 Matcher m2 = p2.matcher(s);

insert image description here
It can be seen that no matter what the symbol is, it will be output as it is, and
here are other encoding ranges:

type scope
Chinese character [0x4e00,0x9fa5]
number [0x30,0x39]
Lower case letters [0x61,0x7a]
uppercase letter [0x41,0x5a]

Reference link: The method of judging whether a string is Chinese in java

Guess you like

Origin blog.csdn.net/weixin_42260782/article/details/131963870