[Deep Learning] Let's Talk about Vectorization

foreword

        Vectorization is an idea, not only reflected in the ability to represent any entity with a vector, but also more prominently shows the development of artificial intelligence. The evolution process of the vector is actually the epitome of the era of artificial intelligence.

1. Why AI needs vectorization

        How does a computer understand a language? The bottom layer of the computer is binary, that is, 0 and 1. All text, audio and video are a string of numbers. This structure is very simple, but there is a problem, there is no law. For example, if we use a 32-bit number to represent "apple" and another 32-bit number to represent "fruit", then "apple" in the computer is a number composed of a string of 0 and 1.

        The computer doesn't know that the string of numbers represents a fruit. When scientists encode letters or Chinese characters, they only consider the needs of storage and display, and do not take into account their actual meaning. For example: the number of beauty is 39, the number of ugliness is 40, love is 41, and the answer of the universe is 42. There is no connection between numbers and words, so they cannot carry the meaning of words themselves. So, the computer neither records meaning nor understands it.

         In the era of non-artificial intelligence, computers are just a tool for us to store and process information, just like refrigerators and pans for storing and processing food. They don’t need to know ice cream and fish-flavored pork shreds. It is enough to make fish-flavored shredded pork, but in the era of artificial intelligence, we need computers to process information in the real world by themselves, and artificial intelligence needs to solve problems by itself. Taking machine translation as an example, it is essentially two different languages ​​corresponding to the same practical meaning , AI needs to find this correspondence by itself in order to translate.

        That is, the computer needs to understand the actual meaning of the language. How to understand it? The answer is vectorization.

2. How to vectorize

        So what is vectorization? Simply put, it is to turn what you want to represent into a combination of numbers. To give a simple example, how to define a person with numbers? You can try to set various measurement dimensions for him.

        For example, we use [0,180,75,20] to represent a person, and give each dimension a corresponding explanation. The first dimension represents gender, the second dimension represents height, the third dimension represents weight, and the fourth dimension represents age. In this way, a set of 4-dimensional arrays can represent a male with a height of 180, a weight of 75KG, and an age of 20. In addition, you can also expand the dimension representation, the more dimensions [gender, height, weight, age, bust, waist, hips, body fat percentage, skin color, hair color, hobbies, education, income...], the more The definition of a person is more accurate.

 3. Advantages after vectorization

        What are the benefits of the vectorized representation? 1. It is convenient for computer processing. 2. The space after vectorization shows certain rules.

        We use height and weight as dimensions, and the people ([180,76]) screened out through these dimensions are more similar to the people we defined ([180,75]). The person who is closest to you in coordinates, that is, the person who is most similar to you. And we add more dimensions, the more measurement standards, in the higher-dimensional coordinate system, we can understand the characteristics of each person through the spatial relationship.

        Similarly, for the representation of words, we also use vectorization, but the vectorization of words is more abstract, and its dimensions are not easy to summarize, but if we also put them in a very high-dimensional coordinate , the words with similar meanings, the closer their space will be.

        The vector can be calculated. When we put the words into the vector space, the vector of the king minus the vector of the man plus the vector of the woman, the vector value obtained is very close to the position of the queen or the queen, which shows that in a In appropriate dimensional coordinates, the spatial relationship between words reflects their actual relationship in the real world.

4. Summary

        Vectorization is a milestone in the era of artificial intelligence. Artificial intelligence and vectorization are like the West and Jerusalem.

Guess you like

Origin blog.csdn.net/weixin_44750512/article/details/131002913