Large Model Basics 03: Embedding Practical Local Knowledge Questions and Answers

Large model basics: Embedding practical local knowledge Q&A

Embedding Overview

Insert image description here
The representation of knowledge in computers is the core issue of artificial intelligence. From the era of databases, the Internet, and large models, the way knowledge is stored has also changed. In a database, knowledge is stored in the form of structured data, and machine language (such as SQL) is required to call this information. In the Internet era, people use search engines to obtain unstructured knowledge on the Internet. For large language models, knowledge is stored in the model in the form of parameters, and this knowledge can be directly called through natural language prompt question and answer methods.

Language is a discrete entity. Natural language representation learning is to represent human language in a way that is easier for computers to understand. Especially after the rise of deep learning, how to better represent natural language in the input layer of the network has become worthwhile. Concerns. In machine learning, embedding refers to the process of mapping high-dimensional data (such as text, pictures, audio) to a low-dimensional space. Embedding can map text data into a numerical vector form, and words with similar semantics have similar positions in the vector space, thus facilitating computer processing and analysis. For example, Cosine distance is used to calculate similarity; the Embedding of multiple words in a sentence is added to obtain the sentence vector.

Embedding originated from Word Embedding and has made great progress over the years. From the perspective of horizontal development, it has developed from the original simple Word Embedding to the current Item Embedding, Entity Embedding, Graph Embedding, Position Embedding, Segment Embedding, etc.

Guess you like

Origin blog.csdn.net/LifeRiver/article/details/132327478