Sometimes, some specific symbols need to be numbered into a form that can be processed by the computer, so that the program can perform subsequent operations, such as further one-hot encoding or Embedding embedding. So, how to quickly encode specific symbols quickly? details as follows:
For example, to encode all uppercase and lowercase English letters, and do forward and reverse mapping:
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
# Mappings from symbol to numeric ID and vice versa:
symbol_to_id = {s: i for i, s in enumerate(symbols)}
id_to_symbol = {i: s for i, s in enumerate(symbols)}
print('symbol_to_id:',symbol_to_id)
print('id_to_symbol:',id_to_symbol)
In the same way, you can also encode all phonemes, all parts of speech, and even all Chinese characters.