& # X is the beginning of what encoding it? ? ?

Reptiles do when the page is likely to get a large area like this character & # dddd; & # xhhhh; & # name;  HTM, these characters are escape sequences HTMLL, XML and other languages like SGML (escape sequence) . They are not "coding."

In HTML, for example, the three escape sequences are known as character reference:

  • The first two are numeric character reference (NCR), the digital value of the target character's Unicode code point; with "& #" followed by the beginning of decimal numbers to back "& # x" at the beginning of the access hexadecimal digits.
  • The latter is a character entity reference, followed by the name of pre-defined entity, and the entity's own statement to refer to the characters.

 

Character Reference numeric (NCR), is the literal translation of numeric character references. A Numeric Character Reference encoded by an ampersand (&) followed by a pound sign (#), and then follow the Unicode character encoding value, and finally followed by a semicolon, like the example above.

 

With numeric character references, you can display Unicode characters in a web page, regardless of coding html file itself, because numeric character references only uses ASCII character set of characters. Therefore, even gb2312 coding web pages, it can also display the Egyptian hieroglyphs with NCR.

 

How to deal with a string in python & # X at the beginning of it?

Coding. 8 = UTF-#

DEF On Dec (A):

    # & # x-decoder beginning: to start or & # x # & character string is called NCR

    # HTMLParser by html or under the unescape py3.x under py2.x () method for conversion to Chinese characters can understand

    aa = a.replace ( ';', '') .replace ( '& # x', '\\ u') encode. ( 'utf-8'). decode ( 'unicode_escape')

    Print (AA)


More technical advice may be concerned about: gzitcast

Guess you like

Origin www.cnblogs.com/heimaguangzhou/p/11464925.html