Use a dictionary to number all symbols in python

Sometimes, some specific symbols need to be numbered into a form that can be processed by the computer, so that the program can perform subsequent operations, such as further one-hot encoding or Embedding embedding. So, how to quickly encode specific symbols quickly? details as follows:

For example, to encode all uppercase and lowercase English letters, and do forward and reverse mapping:

symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")

# Mappings from symbol to numeric ID and vice versa:
symbol_to_id = {s: i for i, s in enumerate(symbols)}
id_to_symbol = {i: s for i, s in enumerate(symbols)}

print('symbol_to_id:',symbol_to_id)
print('id_to_symbol:',id_to_symbol)

 

In the same way, you can also encode all phonemes, all parts of speech, and even all Chinese characters.

Guess you like

Origin blog.csdn.net/m0_46483236/article/details/123765147