"token" is the data unit of the current language class model. The current autoregressive language model is based on tokens as the unit for data processing and calculation. Tokenization is to decompose long texts such as sentences, paragraphs, and articles into token-based data structures. After the text is segmented, each Words are represented as vectors for model computation. For example, in the English context, "happy" may be decomposed into two tokens, "hap" and "-py". a token.
Reprinted: ChatGPT computing power calculation correction - more accurate parameters and calculation methods
On the open-source LLM with parameters comparable to GPT-3 - BLOOM with 176 billion parameters, four ink core S30 computing cards can achieve a content generation speed of 25 tokens/s , exceeding 8 Zhang A100.
ps: tokens/s represents the speed of large model content generation.