unicode character set range

Introduction
       unicode is a unified coding rules the world, but only provides a digital encoding of various characters (official website: www.unicode.org), storage concrete realization has utff-8, utf-16, utf-32 and other forms, there are various different forms of memory and mapping rules unicode code.


Chinese character range
Unicode CJK range distributed across multiple segments, the block's name with CJK characters have. But the most common range of U + 4E00 ~ U + 9FA5, i.e. name
is: CJK Unified Ideographs block, but U + 9FA6 ~ U + 9FFF between character codes also are empty and have yet to define, but not guaranteed It would not be defined.
Note 1: The Chinese range 4E00-9FBF: CJK unified ideographs (CJK Unified Ideographs)
Note 2: The regular expression [\ u4e00- \ u9fa5] can match the Chinese characters, but this approach does not provide the scope of the character set of the platform different change.
Note 3: Unicode code table in U + 4E00 ~ U + 9FFF of: http: //www.unicode.org/charts/PDF/U4E00.pdf
Note 4: Unicode code found in all of the characters: http: // www. unicode.org/cgi-bin/GetUnihanData.pl

Unicode encoding range 
  0000-007F: C0 control and Basic Latin (C0 Control and Basic Latin)
  0080-00FF: a C1 Control and Latin supplementary -1 (Control and a C1 Latin the Supplement. 1)
  0100-017F: Latin Extended - A (Latin extended-A)
  0180-024F: Latin extended -B (Latin extended-B)
  0250-02AF: extensions to the International Phonetic alphabet (extensions of IPA)
  02B0-02FF: blank modified letter (Spacing in the the Modifiers)
  0300-036F: in combination with diacritics (Combining diacritics Marks)
  0370-03FF: Greek and Coptic (Greek and Coptic)
        0400-04FF: Cyrillic alphabet (Cyrillic)
  0500-052F: Cyrillic supplement (Cyrillic supplement)
  0530-058F: Armenian (Armenian)
  0590-05FF: Hebrew (Hebrew)
  0600-06FF: Arabic (Arabic)
  0700-074F: Syriac (Syriac)
  0750-077F: Arabic supplement (Arabic supplement)
  0780-07BF: Maldivian ( Thaana)
  07C0-077F: Nko (N'Ko)
  0800-085F: Avesta and Pahlavi language (Avestan and Pahlavi)
  0860-087F: Manda language (Mandaic)
  0880-08AF: Language Samaria (Samaritan )
  0900-097F: Amagi instruments (Devanagari)
  0980-09FF: Bengali (Bengali)
  0A00-0A7F: Sikh text (Gurmukhi)
  0A80-0AFF: Gujarati (Gujarati)
  0B00-0B7F: Oria text ( Oriya)
  0B80-0BFF: Tamil (Tamil)
  0C00-0C7F: Telugu (Telugu)
  0C80-0CFF: Kannada (Kannada)
  0D00-0D7F: Della Uygur language (Malayalam)
  0D80-0DFF: monk Kiara language (Sinhala)
  0E00-0E7F: Thai (Thai)
  0E80-0EFF: Lao (Lao)
  0F00-0FFF: Tibetan (Tibetan)
  1000-109F: Burmese (Myanmar)
  10A0-10FF: Georgian (Georgian)
  1100-11FF: Korean (Hangul Jamo)
  1200-137F: Ethiopian language (Ethiopic)
  1380-139F: Ethiopic supplement (Ethiopic Supplement)
  13A0-13FF: Cherokee (Cherokee)
  1400-167F: Unified Canadian Aboriginal Festival Voice (Unified Canadian Aboriginal Syllabics)
  1680-169F: Ogham (Ogham)
  16A0-16FF : as Niven (Runic)
  1700-171F: Tagalog (Tagalog)
  1720-173F: Hanuno'o (Hanunóo)
  1740-175F: Department Sid text characters (Buhid)
  1760-177F: Tagbanwa text character (Tagbanwa)
  1780-17FF: Khmer (Khmer)
  1800-18AF: Mongolian (Mongolian)
  18B0-18FF: Cham (Cham)
  1900-194F: Limbu (Limbu)
  1950-197F: Dehong Thai ( Le TAI)
  1980-19DF: new Tai Lü language (new Tai Lue)
  19E0-19FF: Khmer mark (Kmer Symbols)
  1A00-1A1F: Buginese (Buginese)
  1A20-1A5F: Badakhshan (Batak)
  1A80-1AEF : Lanna (Lanna)
  1B00-1B7F: Balinese (Balinese)
  1B80-1BB0: Sundanese (Sundanese)
  1BC0-1BFF: Pahawh Hmong (Pahawh Hmong)
  1C00-1C4F: Lepcha (Lepcha)
  1C50-1C7F: Ol Chiki (Ol Chiki)
  1C80-1CDF: Manipuri (Meithei / in Manipuri)
  1D00-1D7F: phonetics extension (Phonetic extensions)
  1D80-1DBF: phonetics extensions supplement (Phonetic ExtensionsSupplement)
  1DC0-1DFF: combined with diacritics supplement (supplement Combining DiacriticsMarks)
  1E00-1EFF: Latin extended additional (Latin extended additional)
  1F00-1FFF: Greek Extended (Greek Extended)
  2000-206F: common punctuation (General punctuation as)
  2070-209F: superscript and subscript (superscripts and subscripts)
  20A0-20CF: currency symbol (the Currency symbols)
  20D0-20FF: combination of symbols (Combining symbols Marksfor Diacritics)
  2100-214F: letter symbols formula (Letterlike symbols)
  2150-218F: digital form (number The Form1)
  2190-21FF: arrow (arrows)
  2200-22FF: mathematical operators (Mathematical the Operator)
  2300-23FF: Miscellaneous Technical (Miscellaneous Technical)
  2400-243F: Image Control (Control Pictures)
  2440-245F: Optical Character Recognition (Optical Character Recognition)
  2460-24FF: Closed alphanumeric (Enclosed Alphanumerics)
  2500-257F: tab (box Drawing)
  2580-259F: block elements (the element block)
  25A0-25FF: geometry (gEOMETRIC the Shapes)
  2600-26FF: miscellaneous symbols (miscellaneous symbols)
  2700-27BF : printed characters (Dingbats)
  27C0-27EF: miscellaneous mathematical symbols -A (MiscellaneousMathematical symbols-A)
  27F0-27FF: Supplemental arrows -A (Supplemental arrows-A)
  2800-28FF: model Braille (Braille Patterns)
  2900-297F: adding arrow -B (Supplemental arrows-B)
  2980-29FF: miscellaneous mathematical symbols -B (MiscellaneousMathematical symbols-B)
  2A00-2AFF: additional mathematical operators (Supplemental MathematicalOperator)
  2B00-2BFF: Miscellaneous symbol and arrows (Miscellaneous Symbols andArrows)
  2C00-2C5F: Glagolitic (Glagolitic)
  2C60-2C7F: Latin Extended -C (Latin Extended-C )
  2C80-2CFF: ancient Egyptian language (Coptic)
  2D00-2D2F: Georgian supplement (Georgian supplement)
  2D30-2D7F: Tifinagh (Tifinagh)
  2D80-2DDF: Ethiopic extended (Ethiopic extended)
  2E00-2E7F: supplemental punctuation (supplemental punctuation as)
  2E80-2EFF: CJK radicals supplement (CJK radicals the supplement)
  2F00-2FDF: 2F00 2FF0 (Kangxi radicals)
  2FF0-2FFF: ideogram descriptor (ideographic DescriptionCharacters)
  3000-303F: CJK symbols and punctuation ( Punctuation and CJKSymbols)
  3040-309F: Japanese hiragana (Hiragana)
  30A0-30FF: Japanese Katakana (Katakana)
  3100-312F: phonetic alphabet (Bopomofo)
  3130-318F: compatible Korean alphabet (Hangul Compatibility Jamo)
  3190-319F: Comment pictograph sign (Kanbun)
  31A0-31BF: Extended phonetic alphabet (Bopomofo Extended)
  31C0-31EF: CJK Stroke (CJK the Strokes)
  31F0-31FF: Japanese Katakana Phonetic extensions (Katakana PhoneticExtensions)
  3200-32FF: enclosed CJK Letters and Months (enclosed CJK Letters andMonths)
  3300-33FF: CJK compatible (CJK Compatibility)
  3400-4DBF: CJK unified ideographs extension A (CJK unified ideographs extension A )
  4DC0-4DFF: I Ching hexagram symbol (Yijing hexagrams symbols)
  4E00-9FBF: CJK unified ideographic symbols, Chinese characters (CJK unified ideographs)
  A000-A48F: Yi syllable (Yi syllables)
  A490-A4CF: Yi text root (Yi Radicals)
  A500-A61F: Vai language (Vai)
  A660-A6FF: unified Canadian Aboriginal Festival supplementary voice (unified CanadianAboriginal Syllabics supplement)
  A700-A71F: Tone Modifier letters (Modifier Tone letters)
  A720-A7FF: Latin Extended -D (Latin Extended-D)
  A800-A82F: Sylhet text (Syloti Nagri)
  A840-A87F: Pagba word (Phags-PA)
  A880-A8DF: Saurashtra (Saurashtra)
  A900-A97F: Javanese (Javanese)
  A980-A9DF: check Kema language (Chakma)
  AA00-AA3F: Varang Kshiti
  AA40-AA6F: Sorang Sompeng
  AA80-AADF: Newar language (Newari)
  AB00-AB5F: Vietnam Dai language (Vietnam Thai)
  AB80-ABA0: Kayah letters (Kayah and of Li)
  AC00-D7AF: Hangul syllable (Hangul syllables)
  D800-DBFF: upper half 16-UTF (high-half zone of UTF-16)
  DC00- DFFF: low-half region 16 UTF (low-half zone of UTF-16)
  E000-F8FF: self-used area (the use Private zone)
  F900-faff: CJK compatibility Ideographs (CJK compatibility Ideographs)
  FB00-FB4F: letter expressions (Alphabetic Presentation form)
  FB50-FDFF: Arabic expressions A (Arabic PresentationForm-A)
  FE00-FE0F: variable selector (Selector Variation)
  FE10-FE1F: vertical form (Vertical Forms)
  FE20 The FE2F-: in combination with the Half Symbol (Half Combining Marks)
  FE30 The FE4F-: CJK Compatibility Forms (CJKCompatibility Forms)
  Fe50-FE6F: small variant forms (Form1 the small variants)
  FE70 The-the FEFF: Arabic Presentation form B (Arabic PresentationForm-B)
  FF00-FFEF: mold half and full form (halfwidth and FullwidthForm)
  FFF0-FFFF: special (the Specials)
10300..1032F ; Old Italic Font
10330..1034F; Gothic
10400..1044F; Deseret
1D000..1D0FF; Byzantine Musical Symbols
1D100..1D1FF; Musical Symbols
1D400..1D7FF; Mathematical Symbols the Alphanumeric
20000..2A6D6; CJK Unified Ideographs the Extension B
2F800. .2FA1F; CJK Compatibility Ideographs Supplement
E0000..E007F; Tags
F0000..FFFFD;Private Use
100000..10FFFD; Private Use  

----------------
Disclaimer: This article is CSDN blogger "thomashtq 'original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement. .
Original link: https: //blog.csdn.net/thomashtq/article/details/39081233

Guess you like

Origin www.cnblogs.com/alexzhang92/p/11699701.html