C # program names Pinyin

Today, the names Pinyin get a bit algorithm, first of all online search program and found that C # generally has the following solution:

  1. According to ASCII code definition Pinyin library
  2. Microsoft PinYinConverter
  3. Npinyin

The actual use of a bit, support for polyphonic characters will not work, and some also cover incomplete text, basically unable to meet production needs. In addition, there are some special rules names Pinyin, when some of the words as the name of a special readings, such as single, music, check, Wan Qi and so on. These are also the above libraries can not support.

Since no suitable, they had to write their own, and to achieve relatively complete names Pinyin, have to maintain a dictionary thesaurus for the job, so I went to online search, I found a very useful dictionary thesaurus  CEDICT-CC  . This dictionary is ideal for the needs of Chinese Pinyin, which has the following characteristics:

  1. Thesaurus free, you can download
  2. Thesaurus is very small, only 8m, if only for the transfer of names Pinyin scheme can be further compressed.
  3. More comprehensive thesaurus, rare words are basically covered
  4. Thesaurus is a text file, and parsed very simple
  5. Control lexicon comprising all models, can be easily converted Traditional
  6. Lexicon contains words in Pinyin, the problem can be solved more than one pronunciation
  7. Thesaurus contains special pronunciation of the name as a time of, and support for hyphenated.

With this the thesaurus, it can be very easily achieve the names pinyin embodiment, the following steps:

  1. Analytical CC-CEDICT thesaurus, save it to memory, are stored as separate name and surname pinyin dictionary Pinyin dictionary,
  2. The conversion name in traditional Chinese characters to simplified
  3. Name of conduct for word, is divided into two parts, the first and last name
  4. For the name, surname Pinyin dictionary from the query, the query can not press the single-word queries
  5. For the name, from the dictionary of names Pinyin query, if the query can not, then split into individual characters, and then click Query, take the first multi-tone character.
  6. For rules can not cover, support for custom dictionaries, priority use a custom dictionary

This embodiment has the following advantages:

  1. Pinyin name is separate dictionary to solve the problem of special pronunciation of the name
  2. Try to name the words that match, to a large extent solve the problem of multi-tone words.
  3. Traditional Support
  4. Lexicon relatively full support of uncommon words

To compare the relatively few existing programs, the main disadvantage of this scheme is to account for some memory, tried to account for more than 30 + mb of memory, but in this 8g phone memory is pioneering days, even if this memory usage as a desktop the program can accept, not to mention I was going to use the rest as a service, not to consider the cost of.

However, this solution is not perfect, the reading of the name itself may also be controversial, such as "Li heavy", in the end is sent "Li Chong" or "Li Zhong", itself is not an authoritative statement.

Guess you like

Origin www.cnblogs.com/TianFang/p/11234392.html