Deep learning popular understanding-very suitable for novices

Preface
Reading this popular understanding article on deep learning, I feel good, and sharing it here is very suitable for those who have just learned machine learning deep learning.

Special note: This article deliberately avoids mathematical formulas and mathematical arguments for the concept of deep learning.

Fundamentally speaking, deep learning, like all machine learning methods, is a process of modeling specific problems in the real world with mathematical models to solve similar problems in the field.

First of all, deep learning is a type of machine learning. Since it is called "learning", it is naturally similar to our human learning process. Recall, how does a human child learn?

How do human children learn?

How does the machine learn?

For example, many children use literacy cards to recognize characters. From the tracing books such as "Shang Da Ren, Kong Yiji" that people used in ancient times to the literacy card APP that teaches children to read on mobile phones and tablets, the most basic idea is to follow the order from simple to complex. Children repeatedly read the various writing methods of each Chinese character (older children even have to learn to recognize different calligraphy fonts), and after reading more, they will naturally remember. It will be easy to recognize the same word next time.

This interesting process of literacy seems simple, but it is infinitely mysterious. When recognizing characters, it must be the children’s brain that has been stimulated by similar images many times and summed up a certain regularity for each Chinese character. Next time the brain sees a pattern that conforms to this regularity, it will know what it is. Word up.

In fact, to teach a computer to read characters, it is almost the same. The computer must first read the pattern of each word many, many times, and then, in the computer's brain (processor plus memory), sum up a rule, and then the computer will see similar patterns again, as long as it conforms to the previous summary The computer can know what character the pattern is.

In professional terms, the pictures used by computers for learning and repeated viewing are called "training data sets"; in "training data sets", the attributes or characteristics of one type of data that are different from another type of data are called " Features"; the process by which the computer summarizes the rules in the "brain" is called "modeling"; the rules summarized by the computer in the "brain" are what we often call "models"; and the computer summarizes the rules by repeatedly looking at pictures , And then the process of learning to recognize characters is called "machine learning".

How does the computer learn? What are the laws summarized by the computer? It depends on what machine learning algorithm we use.

There is an algorithm that is very simple, imitating the idea of ​​literacy in children. Parents and teachers may have this experience:

Children begin to learn to read. For example, when teaching children to distinguish "one", "two" and "three" first, we will tell the children that the word written in one stroke is "one" and the word written in two strokes is "two", three The word written with the pen is "three". This rule is easy to remember and easy to use. However, when you start to learn new characters, this rule may not work.

For example, "kou" is also three strokes, but it is not "three". We usually tell the children that what is enclosed in a box is "mouth", and what is arranged in horizontal rows is "three." This pattern has enriched one more layer, but the increase in literacy still cannot be suppressed. Soon, the children discovered that "Tian" is also a square, but it is not "mouth". We will tell the children at this time that there is a "ten" in the box is "Tian". In the future, we will most likely tell the children that "Tian" means "you" at the top, "A" at the bottom, and "Shen" at the top.

Under the guidance of such characteristic rules that are enriched step by step, many children slowly learn to summarize the rules by themselves, memorize new Chinese characters, and then learn thousands of Chinese characters.

There is a machine learning method called decision tree, which is very similar to the above process of literacy based on the characteristic law. When the computer only needs to recognize the three characters "one", "two", and "three", the computer only needs to count the number of strokes of the Chinese character to be recognized. When we add "口" and "田" to the set of Chinese characters to be recognized (training data set), the previous determination method of the computer fails, and other determination conditions must be introduced. With this advancement step by step, the computer can recognize more and more characters. Insert picture description here
Insert picture description here
The attached picture shows the difference in the decision tree inside the computer before and after the computer learns the three new Chinese characters "you", "甲" and "申". This shows that when we "see" the three new Chinese characters and their characteristics for the computer, the computer is like a child, summarizing and remembering the new rules, and "knowing" more Chinese characters. This process is the most basic type of machine learning.

Of course, this learning method based on decision trees is too simple, difficult to expand, and difficult to adapt to different situations in the real world. As a result, scientists and engineers have invented many different machine learning methods.

For example, we can map the characteristics of the Chinese characters "you", "jia", and "shen", including whether they come out, the positional relationship between the strokes, etc., to a point in a certain space (I know, there is mathematics here) Terminology. But this is not important. Whether you understand the true meaning of "mapping" does not affect subsequent reading at all). In other words, in the training data set, the large number of different ways of writing these three characters becomes a large number of points in space from the computer's perspective. As long as we extract the features of each word well enough, a large number of points in the space will be roughly distributed in three different ranges.

At this time, let the computer observe the rules of these points to see if you can use a concise segmentation method (such as drawing a straight line in the space) to divide the space into several independent regions, and try to make each word in the training data set correspond to The points are all located in the same area. If this segmentation is feasible, it means that the computer has "learned" the distribution of these characters in space and established a model for these characters.
Insert picture description here

Next, when seeing a new image of Chinese characters, the computer simply converts the image into a point in space, and then determines which character area the point falls in. Now, you can’t know what character the image is. ?

Many people may have noticed that it is difficult to adapt to thousands of Chinese characters and at least tens of thousands of different writing methods using the method of drawing straight lines to divide a plane space (as shown in the figure). If you want to correspond to the different deformations of each Chinese character as a point in space, it is extremely difficult to find a mathematically straightforward method to divide the corresponding point of each Chinese character into different areas.

For many years, mathematicians and computer scientists have been troubled by similar problems. People continue to improve machine learning methods. For example, use complex higher-order functions to draw a variety of curves in order to separate the intersecting points in space, or simply find a way to turn a two-dimensional space into a three-dimensional space, a four-dimensional space, or even several hundred or several thousand. High-dimensional space of tens of thousands of dimensions. Before deep learning became practical, people invented many traditional, non-deep machine learning methods. Although these methods have achieved certain achievements in specific fields, the world is really complex, diverse and varied. No matter how elegant the modeling method people choose for computers, it is difficult to truly simulate the characteristic laws of everything in the world. This is like a painter trying to paint the true face of the world with a limited number of colors. No matter how good his painting skills are, it is difficult for him to be "realistic".

So, how to greatly expand the basic means of computers in describing the laws of the world? Is it possible to design a highly flexible expression method for the computer, and then let the computer continue to try and find in the large-scale learning process, and summarize the rules by itself, until finally find a representation method that meets the characteristics of the real world?

Now, we finally talk about deep learning!

Deep learning is such a machine learning method that is flexible and changeable in expressive ability while allowing the computer to keep trying until it finally approaches the goal.

Mathematically speaking, there is no substantial difference between deep learning and the traditional machine learning methods mentioned above. They all hope to distinguish different types of objects in a high-dimensional space based on object characteristics. However, the expressive ability of deep learning is very different from traditional machine learning.

Simply put, deep learning is to treat what the computer wants to learn as a large amount of data, throw this data into a complex data processing network (deep neural network) that contains multiple layers, and then check the processing obtained by this network Does the result data meet the requirements? If it does, keep this network as the target model. If it doesn't, adjust the network parameter settings repeatedly and persistently until the output meets the requirements.

This is still too abstract and too difficult to understand. Let's change to a more intuitive way of speaking.

Assume that the data to be processed by deep learning is the "flow" of information, and the deep learning network for processing data is a huge water pipe network composed of pipes and valves. The entrance of the network is a number of pipe openings, and the exit of the network is also a number of pipe openings. This water pipe network has many layers, and each layer has many regulating valves that can control the direction and flow of water. According to the needs of different tasks, the number of layers of the water pipe network and the number of regulating valves on each layer can have different combinations. For complex tasks, the total number of control valves can be thousands or more. In the water pipe network, each regulating valve on each layer is connected to all regulating valves on the next layer through water pipes, forming a water flow system that is completely connected from front to back, layer by layer (here is a relatively basic situation , Different deep learning models have differences in the installation and connection of water pipes).
Insert picture description here
So, how can computers use this huge network of water pipes to learn to read?

For example, when the computer sees a picture with the word "田" written on it, it simply composes all the numbers that make up the picture (in the computer, each color point of the picture is composed of "0" and "1" (Represented by the numbers) are all turned into a stream of information, poured into the water pipe network from the entrance.

We have inserted a character plate at each exit of the water pipe network in advance, corresponding to each Chinese character we want the computer to recognize. At this time, because the input is the Chinese character "田", when the water flows through the entire water pipe network, the computer will go to the outlet of the pipe to see if the pipe outlet marked with the word "田" has the most water flow . If so, it means that the pipeline network meets the requirements. If this is not the case, we will issue an order to the computer: adjust each flow control valve in the water pipe network so that the digital water flow "outflow" from the "tian" exit is the most.

At this time, the computer has been busy for a while, and there are so many valves to adjust! Fortunately, the computer calculation speed is fast, violent calculation plus algorithm optimization (actually, it is mainly delicate mathematical methods, but we don’t talk about mathematical formulas here, you just need to imagine the computer desperately calculating), you can always give it quickly Come up with a solution, adjust all the valves so that the flow at the outlet meets the requirements.

In the next step, when learning the word "申", we will use a similar method to turn every picture with the word "申" into a flow of numbers composed of a large number of water, and then feed it into the water pipe network. The outlet of the pipe with the word "申" has the most water. If not, we have to adjust all the regulating valves again. This time, it is necessary to ensure that the word "tian" that has just been learned is not affected, and that the new word "申" can be processed correctly.

This is repeated until the water corresponding to all Chinese characters can flow through the entire water pipe network in a desired manner. At this time, we said that this water pipe network is already a trained deep learning model.

For example, the figure shows the process of water flow of information in the word "田" being poured into the water pipe network. In order to make the water flow more from the outlets marked with the word "田", the computer needs to adjust all the flow control valves almost crazy in a specific way, and keep experimenting and exploring until the water flow meets the requirements.

When a large number of literacy cards are processed by this pipeline network and all valves are adjusted in place, the entire water pipeline network can be used to identify Chinese characters. At this time, we can "weld down" all the valves that have been adjusted and wait for the new water flow to arrive.

Similar to what is done during training, the unknown pictures will be transformed into a stream of data by the computer and fed into the trained water pipe network. At this time, the computer only needs to observe which outlet has the most water flow, and which character is written in this picture.

Is it simple? Is it magical? Is deep learning actually such a learning method that relies on crazy adjustment of the valve to "catch up" the best model? In the entire water pipe network, why should each valve be adjusted so and why should it be adjusted to such a degree, is it completely determined by the final water flow of each outlet? There is really no profound truth in it?

Deep learning is roughly such a semi-theoretical and semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, semi-theoretical, and semi-theoretical that uses human mathematical knowledge and computer algorithms to adjust internal parameters as much as possible. Empirical modeling method.

The basic guiding principle of deep learning is a kind of pragmatic thinking.

Don’t you want to understand more complex world laws? Then we will continue to increase the number of adjustable valves in the entire water pipe network (increase the number of layers or increase the number of regulating valves per layer). Isn't there a lot of training data and large-scale computing power? Then we let many CPUs and many GPUs (graphics processors, commonly known as graphics chips, originally dedicated to drawing and playing games, and it happens to be particularly suitable for deep learning calculations) to form a huge computing array, and the computer is desperately adjusting countless valves. In the process, learn the hidden laws in the training data. Perhaps it is precisely because of this pragmatism that the perception ability (modeling ability) of deep learning is far stronger than traditional machine learning methods.

Pragmatism means not understanding. Even if a deep learning model has been trained to be very "smart" and can solve the problem very well, in many cases, even the person who designs the entire water pipe network may not be able to clearly explain why each valve in the pipeline needs to be adjusted like this . That is to say, people usually only know whether the deep learning model is working, but it is difficult to tell what is the causal relationship between the value of a parameter in the model and the perception ability of the final model.

This is really interesting. In the eyes of many people, the most effective machine learning method in history is a "black box" that can only be understood and cannot be explained.

A philosophical speculation that arises from this is that if people only know what the computer has learned, but cannot tell what kind of law the computer masters in the learning process, will this learning itself get out of control?

For example, many people are worried that if we continue to develop in this way, will the computer quietly learn what we don't want it to learn? In addition, in principle, if the number of layers of deep learning models is increased indefinitely, can the modeling capabilities of computers be comparable to the ultimate complexity of the real world? If the answer is yes, then as long as there is enough data, the computer can learn all possible knowledge in the universe-what will happen next? Do you have some worries about the wisdom of computers than humans? Fortunately, experts have not yet agreed on whether deep learning is capable of expressing complex knowledge at the cosmic level. Mankind is relatively safe at least for the foreseeable future.

One additional point: At present, some visualization tools have emerged that can help us "see" what deep learning looks like when performing large-scale operations. For example, Google's famous deep learning framework TensorFlow provides a web version of a small tool (Tensorflow-Neural Network Playground), which uses easy-to-understand icons to draw the real-time characteristics of the entire network that is undergoing deep learning operations.
Insert picture description here
The attached figure shows what a deep neural network with 4 intermediate layers (hidden layers) looks like when it learns against a certain training data set. In the figure, we can intuitively see the direction and size of the data "water flow" between each level of the network and the next level. We can also change the basic settings of the deep learning framework on this web page at any time, and observe the deep learning algorithm from different angles. This is very helpful for us to learn and understand deep learning.

Article Source
This article is from: It turns out that understanding deep learning is so simple!
Author: Wang Yonggang
Taken Kai-fu Lee, Wang Yonggang "artificial intelligence," a book

Guess you like

Origin blog.csdn.net/qq_16488989/article/details/109199641