The first is to convert Chinese characters to pinyin:
import dependencies:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.1.2</version>
</dependency>
Create a static method toPinyin for pinyin conversion, which can also be encapsulated into a tool class and called using the tool class:
public static String toPinyin(String chinese){
String pinyinStr = "";
char[] newChar = chinese.toCharArray();
HanyuPinyinOutputFormat defaultFormat = new HanyuPinyinOutputFormat();
defaultFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);
defaultFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);
for (int i = 0; i < newChar.length; i++) {
if (newChar[i] > 128) {
try {
pinyinStr += PinyinHelper.toHanyuPinyinStringArray(newChar[i], defaultFormat)[0];
} catch (BadHanyuPinyinOutputFormatCombination e) {
e.printStackTrace();
}
}else{
pinyinStr += newChar[i];
}
}
return pinyinStr;
}
The call in the method is successful. Here I need to capitalize and keep spaces between each Chinese character. The operation is as follows:
String name="西青果颗粒藏青果颗";
String[] split = name.split("");
String finalPy="";
for (String s : split) {
String s1 = toPinyin(s).toUpperCase(Locale.ROOT);
finalPy= finalPy+" "+s1;
}
System.out.println(finalPy.trim());
Everything is fine here, but if the string contains non-English special symbols, an error will be reported as follows:
Here you need to replace various symbols in Chinese with those in English:
use the replaceAll method of String to replace, As shown in the picture:
Later, various Roman symbols appeared. This method is too troublesome. You can judge whether each character is a Chinese character. If it is a Chinese character, call the toPinyin method to convert it. If it is not a Chinese character, it will not be converted. There are many ways to achieve it. Here Use one: judge by unicode encoding range:
Pattern p2 = Pattern.compile("[\u4e00-\u9fa5]");
Matcher m2 = p2.matcher(s);
It can be seen that no matter what the symbol is, it will be output as it is, and
here are other encoding ranges:
type | scope |
---|---|
Chinese character | [0x4e00,0x9fa5] |
number | [0x30,0x39] |
Lower case letters | [0x61,0x7a] |
uppercase letter | [0x41,0x5a] |
Reference link: The method of judging whether a string is Chinese in java