Chinese pinyin C / C ++

Together we first understand what is UNICODE coding?
Unicode (Unicode, Unicode, single) in the field of computer science is an industry standard, including character sets, coding schemes. Ascll code table we all know it, and Unicode code can be said to extend Ascall code. There Uincode tables, corresponding to different characters:
4E00-9FA5: Chinese range
0000-007F: C0 control and Basic Latin (C0 Control and Basic Latin)
0080-00FF: a C1 Control and Latin supplementary -1 (C1 Latin the Supplement. 1 and Control)
0100-017F: Latin extended -A (Latin extended-A)
0180-024F: Latin extended -B (Latin extended-B)
0250-02AF: extensions to the International Phonetic Alphabet (extensions of IPA)
02B0-02FF : blank Modifier letters (Spacing the Modifiers)
0300-036F: combined with diacritics (Combining diacritics Marks)
0370-03FF: Greek and Coptic (Greek and Coptic)
0400-04FF: Cyrillic alphabet (Cyrillic)
0500-052F: Cyrillic supplement (Cyrillic supplement)
0530-058F: Armenian (Armenian)
0590-05FF: Hebrew (Hebrew)
0600-06FF: Arabic (Arabic)
0700-074F: Syriac (Syriac)
0750-077F: Arabic Supplement (Arabic Supplement)
0780-07BF: Maldivian (Thaana)
07C0-077F: Nko (N'Ko)
0800-085F: Avesta and Pahlavi language (Avestan and Pahlavi )
0860-087F: Mandaic
0880-08AF: language Samaria (Samaritan)
0900-097F: Amagi instruments (Devanagari)
0980-09FF: Bengali (Bengali)
0A00-0A7F: Sikh text (Gurmukhi)
0A80-0AFF: Old guitar LVL text (Gujarati)
0B00-0B7F: Oria text (Oriya)
0B80-0BFF: Tamil (Tamil)
0C00-0C7F: Telugu (Telugu)
0C80-0CFF: Kannada (Kannada)
0D00 -0D7F: Della Uygur language (Malayalam)
0D80-0DFF: Sinhalese (Sinhala)
0E00-0E7F: Thai (Thai)
0E80-0EFF: Lao (Lao)
0F00-0FFF: Tibetan (Tibetan)
1000-109F : Burmese (Myanmar)
10A0-10FF: Georgian (Georgian)
1100-11FF: Korean (Hangul Jamo)
1200-137F: Ethiopian language (Ethiopic)
1380-139F: Ethiopic supplement (Ethiopic Supplement)
13A0-13FF: Cherokee (Cherokee)
1400-167F: Unified Canadian Aboriginal Festival Voice (Unified Canadian Aboriginal Syllabics)
1680-169F: Ogham (Ogham)
16A0-16FF: as Niven (Runic)
1700-171F: Tagalog (Tagalog)
1720-173F: Hanunóo
1740-175F: Buhid
1760-177F: Tagbanwa
1780-17FF: Khmer ( khmer)
1800-18AF: Mongolian (Mongolian)
18B0-18FF: Cham
1900-194F: Limbu
1950-197F: Dehong Thai (Tai Le)
1980-19DF: new Tai Lü language (new Tai Lue)
19E0-19FF: high cotton sign language (Kmer Symbols)
1A00-1A1F: Buginese
1A20-1A5F: Batak
1A80-1AEF: Lanna
1B00-1B7F: Balinese (Balinese)
1B80-1BB0: Sundanese (Sundanese)
1BC0-1BFF: Pahawh Hmong
1C00-1C4F: Lepcha (Lepcha)
1C50-1C7F: Ol Chiki
1C80-1CDF: Manipuri (Meithei / in Manipuri)
1D00-1D7F: phonetics extension (Phonetic Extensions)
1D80-1DBF: phonetics Extensions Supplement
1DC0 -1DFF: combined with diacritics supplement (Combining diacritics Marks supplement)
1E00-1EFF: Latin Extended additional (Latin Extended additional)
1F00-1FFF: Greek expansion (Greek Extended)
2000-206F: common punctuation (General punctuation)
2070- 209F: superscript and subscript (superscripts and subscripts)
20A0-20CF: currency symbol (the Currency symbols)
20D0-20FF: combination of symbols (for Combining Diacritics Marks symbols)
2100-214F: letter symbols formula (Letterlike symbols)
2150-218F : digital form (number The Form1)
2190-21FF: arrow (arrows)
2200-22FF: mathematical operators (mathematical the operator)
2300-23FF: miscellaneous Technical (miscellaneous Technical)
2400-243F: Image Control (Control Pictures)
2440-245F: Optical Character Recognition (Optical Character Recognition)
2460-24FF: Closed alphanumeric (Enclosed Alphanumerics)
2500-257F: tab (Box Drawing)
2580-259F: block elements (the element Block)
25A0-25FF: geometry (gEOMETRIC the Shapes)
2600-26FF: miscellaneous symbols (miscellaneous symbols)
2700-27BF: a printed symbol (Dingbats)
27C0-27EF: miscellaneous mathematical symbols -A (miscellaneous mathematical symbols-A)
27F0-27FF: Supplemental arrows -A (Supplemental arrows-A)
2800-28FF: model Braille (Braille Patterns)
2900-297F: append arrow -B (Supplemental arrows-B)
2980-29FF: miscellaneous mathematical symbols -B (miscellaneous symbols B-mathematical)
2A00-2AFF: additional mathematical operators (Supplemental mathematical the operator)
2B00-2BFF: miscellaneous symbol and arrows (miscellaneous symbols and arrows)
2C00-2C5F: Glagolitic (Glagolitic)
2C60-2C7F: Latin Extended -C (Latin Extended-C)
2C80-2CFF: ancient Egyptian language (Coptic)
2D00-2D2F: Georgian supplement (Georgian Supplement)
2D30-2D7F: Tifinagh (Tifinagh)
2D80-2DDF : Ethiopic extended (Ethiopic extended)
2E00-2E7F: supplemental punctuation (supplemental punctuation)
2E80-2EFF: CJK radicals supplement (CJK radicals supplement)
2F00-2FDF: Kangxi Radical (Kangxi radicals)
2FF0-2FFF: ideographic description character (the Description ideographic Characters)
3000-303F: CJK symbols and punctuation (CJK symbols and punctuation as)
3040-309F: Hiragana (Hiragana)
30A0-30FF: Katakana (Katakana)
3100-312F: phonetic alphabet (Bopomofo)
3130 -318F: compatible Korean alphabet (Hangul Compatibility Jamo)
3190-319F: comment pictograph sign (Kanbun)
31A0-31BF: extended phonetic alphabet (Bopomofo extended)
31C0-31EF: CJK stroke (CJK strokes)
31F0-31FF: Katakana Phonetic Extensions (Katakana Phonetic Extensions)
3200-32FF: Enclosed CJK Letters and Months (Enclosed CJK Letters and Months)
3300-33FF: CJK compatible (CJK Compatibility)
3400-4DBF: CJK unified ideographs extension A (CJK unified ideographs Extension A)
4DC0-4DFF: I Ching hexagram symbol (Yijing hexagrams symbols)
4E00-9FBF: CJK unified ideographs (CJK unified ideographs)
A000-A48F: Yi syllable (Yi syllables)
A490- A4CF: root Yi (Yi Radicals)
A500-A61F: Vai
A660-A6FF: unified Canadian Aboriginal voice section supplement
A700-A71F: Tone Modifier letters (Modifier Tone letters)
A720-A7FF: Latin extended -D (Latin extended-D )
A800-A82F: Syloti Nagri
A840-A87F: Pagba word (Phags-PA)
A880-A8DF: Saurashtra
A900-A97F: Javanese (Javanese)
A980-A9DF: Chakma
AA00-AA3F: Varang Kshiti
AA40-AA6F: Sorang Sompeng
AA80-AADF: Newari
AB00-AB5F: Vietnam Dai (Vi t Thái?)
AB80-ABA0: Kayah Li
AC00-D7AF: Hangul syllables (Hangul Syllables)
D800-DBFF: High-Half Zone of 16-UTF
DC00-DFFF: Low-Half zone of UTF-16
E000-F8FF: self-used area (the use Private zone)
F900-faff: CJK compatibility Ideographs (CJK compatibility Ideographs)
FB00-FB4F: alphabet form expression (alphabetic Presentation Form1)
FB50-FDFF: Arabic Presentation form A (the Presentation Form1 Arabic-A)
FE00-FE0F: variable selector (Selector Variation)
FE10-FE1F: vertical form (Vertical forms)
FE20 The FE2F-: in combination with the half symbol (Combining half Marks)
FE30 The FE4F-: CJK compatibility forms (CJK compatibility forms)
Fe50-FE6F: small variant forms (Form1 the small variants)
FE70 The-the FEFF: Arabic Presentation form B (Arabic Presentation form-B)
FF00-FFEF: mold half and full Form (Form1 halfwidth and fullwidth)
FFF0-FFFF: Special (the Specials)

Gorgeous dividing line ============================================== =======================

// determines whether a character in Chinese, apparent from the above table, the UNICODE encoding range in Chinese ~ 0x9FA5 0x4E00 
BOOL IsChinese (QChar Qch) 
{ 
    ushort Unicode = qch.unicode ();
     IF (Unicode> = 0x4E00 && Unicode < = 0x9FA5 ) 
    { 
        return  to true ; 
    } 
    return  to false ; 
} 
// was thus obtained, whether the Chinese character input
Zh2PinYin Zh2PinYinUtils :: QString ( const QString & chinese) 
{ 
    QString pinyins; 
    for ( int I = 0 ; I <chinese.length (); ++ I) 
    { 
        int Unicode QString :: = Number (chinese.at (I). Unicode (), 10 ) .toint ();
         IF (Unicode> = 0x4E00 && Unicode <= 0x9FA5 ) 
        { 
        // here Unicode_Table UNICODE is in accordance with each corresponding Chinese Pinyin array; 
            pinyins.append (Unicode_Table [Unicode - 0x4E00 ] ); 
        } 
        the else 
        { 
        // this is not Chinese, we do not do any treatment, reserved;
            pinyins.append (chinese.at (I)); 
        } 
        pinyins.append ( "  " ); 
    } 
    return pinyins; 
} 
// This can be obtained corresponding to the alphabet Chinese

 

Guess you like

Origin www.cnblogs.com/qq702368956/p/12652626.html