unicode character equivalent inquiry

Demo
baidu.com (\ uff41) can jump to baidu.com
bаidu.com (\ u0430) can not jump to baidu.com, it is considered to be a domain name other

equivalent reasons
may exist between two different Unicode character encoding some equivalence, this equivalence between characters or character sequences is relatively weak equivalent types, these variant forms may be visually or equivalence of meaning. Such as a and a (\ uff41) fonts look the same, and 15 ⑮ (\ u246e) which represents the mathematical sense may be the same.

Characters are converted
whether the methods are converted into characters Normalization Form C (NFC) and Normalization Form KC (NFKC) two kinds, depending on the difference between them is generated text of the original non-standardized text equivalent, K represents compatibility.
There Normalization Form D (NFD) and Normalization Form KD (NFKD) composed of two kinds of decomposition character the way.
And the NFC difference NFD:
e.g. Å (\ u212B) performed with NFD normalize, it becomes Å (\ u0041 \ u030a), and the post-treatment is NFC Å (\ u00c5). During normalize in detects this character is in the NFC table, if the corresponding conversion algorithm is performed, for example before Demo, a (\ uff41) present is converted into a, а in NFC (\ u0430) does not exist no conversion.

Equivalent exploit
after traversing once, find normalize all the characters and character ASCII character equivalent, can bypass some filtering,
Lyle master script:
------------------------------------------------------------------------------------------
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import json
from unicodedata import normalize


def main():
debug = False
tables = {}
for i in range(1, 0x10000):
src = unichr(i)
dst = normalize('NFKC', src)[0]
try:
if ord(dst) < 128 and dst != src:
if debug:
print("%s (\\u%s) -- normalize --> %s (\\x%s)" % (
src, hex(i)[2:].rjust(4, '0'),
dst, hex(dst.charAt(0))[2:]
))
if dst in tables:
tables[dst].append(src)
else:
tables[dst] = [src]
except Exception as e:
print(repr(e))
with open("nfctable.txt", "wb") as fh:
json.dump(tables, fh)


if __name__ == '__main__':
main()
----------------------------------------------------------------------------------------

Guess you like

Origin www.cnblogs.com/cimuhuashuimu/p/11490292.html