C# Chinese characters to Pinyin (supports polyphonic characters)

  Previously, due to the needs of the project, a function of converting Chinese characters to Pinyin and Shoupin was needed to do the query. I felt that this function was basically mature, so I searched for the relevant code, and the first two articles were introduced.

     1. C#  Chinese characters to Pinyin ( supports all Chinese characters in the GB2312 character set )

     2. [Dry goods] JS version of the ultimate program for the conversion of Chinese characters and pinyin, with a simple JS pinyin input method

  Thanks to the two bloggers, the writing is relatively complete and detailed, and the source code is provided, you can refer to it.

  Considering the needs of the interface, I refer to the  first article. The author's source code in the article can basically meet the needs of converting Chinese characters to pinyin. For other special characters, they can also be added and supplemented. The disadvantage is that polyphonic characters are not supported. Due to the need to support the query of polyphonic words, I checked other articles later, and found that there is no ready-made article (maybe my search level is relatively low). Later, I found that for Chinese characters to pinyin, Microsoft has provided  Microsoft Visual Studio International Pack  , and it is very powerful . So I tried

First reference the corresponding package in nuget

 Find PinYinConverter

simple demo

Try it out, it's very simple to use, just use the ChineseChar class for replacement.

1             string ch = Console.ReadLine();
2             ChineseChar cc = new ChineseChar(ch[0]);
3             var pinyins = cc.Pinyins.ToList();
4             pinyins.ForEach(Console.WriteLine);

结果如下:

  我们可以看到, 行 的多音字有 hang,heng,xing 三个,这里连音标也出来了,确实很方便。而我需要的功能是输入 银行 ,然后转换为拼音是 yinhang,yinheng,yinxing,  首拼是 yh,yx。有ChineseChar 这个类的话做起来思路就简单了。

 汉字转拼音类封装

  1.首先对输入的汉字进行拆分

  2.接着每个汉字用ChineseChar 获取多个拼音

  3.然后除去数字,去重,提取首字符,再在进行组合就好了

  于是写了个帮助类进行装换,代码如下:

 

复制代码
 public class PinYinConverterHelp
    {
        public static PingYinModel GetTotalPingYin(string str)
        {
            var chs = str.ToCharArray();
            //记录每个汉字的全拼
            Dictionary<int, List<string>> totalPingYins = new Dictionary<int, List<string>>();
            for (int i = 0; i < chs.Length; i++)
            {
                var pinyins = new List<string>();
                var ch = chs[i];
                //是否是有效的汉字
                if (ChineseChar.IsValidChar(ch))
                {
                    ChineseChar cc = new ChineseChar(ch);
                    pinyins = cc.Pinyins.Where(p => !string.IsNullOrWhiteSpace(p)).ToList();
                }
                else
                {
                    pinyins.Add(ch.ToString());
                }

                //去除声调,转小写
                pinyins = pinyins.ConvertAll(p => Regex.Replace(p, @"\d", "").ToLower());
                //去重
                pinyins = pinyins.Where(p => !string.IsNullOrWhiteSpace(p)).Distinct().ToList();
                if (pinyins.Any())
                {
                    totalPingYins[i] = pinyins;
                }
            }
            PingYinModel result = new PingYinModel();
            foreach (var pinyins in totalPingYins)
            {
                var items = pinyins.Value;
                if (result.TotalPingYin.Count <= 0)
                {
                    result.TotalPingYin = items;
                    result.FirstPingYin = items.ConvertAll(p => p.Substring(0, 1)).Distinct().ToList();
                }
                else
                {
                    //全拼循环匹配
                    var newTotalPingYins = new List<string>();
                    foreach (var totalPingYin in result.TotalPingYin)
                    {
                        newTotalPingYins.AddRange(items.Select(item => totalPingYin + item));
                    }
                    newTotalPingYins = newTotalPingYins.Distinct().ToList();
                    result.TotalPingYin = newTotalPingYins;

                    //首字母循环匹配
                    var newFirstPingYins = new List<string>();
                    foreach (var firstPingYin in result.FirstPingYin)
                    {
                        newFirstPingYins.AddRange(items.Select(item => firstPingYin + item.Substring(0, 1)));
                    }
                    newFirstPingYins = newFirstPingYins.Distinct().ToList();
                    result.FirstPingYin = newFirstPingYins;
                }
            }
            return result;
        }
    }
复制代码

 

调用方式:

                Console.WriteLine("请输入中文:");
                string str = Console.ReadLine();
                var pingyins = PinYinConverterHelp.GetTotalPingYin(str);
                Console.WriteLine("全拼音:" + String.Join(",", pingyins.TotalPingYin));
                Console.WriteLine("首音:" + String.Join(",", pingyins.FirstPingYin));
                Console.WriteLine();

结果:

目前试过一些生僻字都是能支持,对于一些太偏的还没试过,不过对于一般汉字转拼音的,多音字支持这里就已经足够了。

这里仅仅是使用了 Microsoft Visual Studio International Pack 这个扩展包里面的汉字转拼音功能,其实里面还有中文、日文、韩文、英语等各国语言包,并提供方法实现互转、获、获取字数、甚至获取笔画数等等强大的功能,有兴趣的朋友可以自行查询下它的api

源码分享

  分享是一种美德,有时候牛逼的文章可以提高我们的技术层面,但有时候更多的需求是业务层面,很多小知识应用的分享却可以帮我们提高业务层面的问题。只要分享的知识点有用,不误人子弟,哪怕大小都是一种学习,所以也希望大家能勇于分享。

  最后,源码分享出来给大家,如果有错误和不足的地方,也希望指正

  地址:https://github.com/qq1206676756/PinYinParse


原文地址:https://www.cnblogs.com/qtqq/p/6195641.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325476686&siteId=291194637