Transfer Learning - Training Deep Learning Models When Data Is Not Enough

Reprinted in: https://news.cnblogs.com/n/562441/

  Wu Enda, a deep learning expert, once said: Doing AI research is like building a spaceship. In addition to sufficient fuel, a strong engine is also essential. If the fuel is insufficient, the spacecraft will not be able to enter the intended orbit. And the engines weren't powerful enough that the spacecraft couldn't even lift off. Analogous to AI, deep learning models are like engines, and massive training data is like fuel, both of which are also indispensable for AI.

  With the wide application and popularity of deep learning technology in the fields of machine translation, strategy games and autonomous driving, a common problem that hinders the further promotion of this technology has become increasingly prominent: the massive data necessary for training models is difficult to obtain.

  The following are some of the more popular machine learning models and the amount of data they require. It can be seen that as the complexity of the model increases, the number of parameters and the amount of data required are also staggering.

How to train a deep learning model when there is not enough data?  Try transfer learning

  Based on this situation, this paper will start with the layered structure of deep learning, introduce the relationship between the amount of data required for model training and the scale of the model, and then introduce the important role of transfer learning in reducing the amount of data through a specific example, and finally Recommend a cloud tool that can simplify the implementation steps of transfer learning: NanoNets.

  Deep Learning Models for Layered Structures

  Deep learning is a large neural network, which can also be viewed as a flow chart, where data is input from one end and training results are output from the other end. Because of the layered structure, you can also break out the neural network, separate it into layers, and retrain it with the output of any layer as the input to the other system.

How to train a deep learning model when there is not enough data?  Try transfer learning

  Data volume, model size, and problem complexity

  There is an interesting linear positive relationship between the amount of training data required by the model and the size of the model. One rationale is that the model should be large enough to adequately capture the connections between different parts of the data (e.g. texture and shape in images, grammar in text, and phonemes in speech) and details of the problem to be solved (eg the number of categories). The layers in the front-end of the model are typically used to capture high-level connections to the input data (eg, image edges, bodies, etc.). The layers behind the model are typically used to capture information that helps make the final decision (usually details that differentiate the target output). Therefore, the higher the complexity of the problem to be solved (such as image classification, etc.), the larger the number of parameters and the amount of training data required.

How to train a deep learning model when there is not enough data?  Try transfer learning

  Introduce transfer learning

  In most cases, it is a common fact in the industry that you cannot find sufficient training data for a specific problem in a certain field. However, thanks to the help of a technology, models trained from other data sources can be reused in similar fields after certain modifications and improvements, which greatly alleviates the problems caused by insufficient data sources. This key technique is transfer learning.

  According to the "Most Cited Deep Learning Papers" list published on Github, more than 50% of high-quality papers in the deep learning field use transfer learning techniques or pretraining in some way. Transfer learning has gradually become the technology of choice for AI projects with insufficient resources (insufficient data or computing power). But the reality is that there are still a large number of AI projects suitable for transfer learning technology, and the existence of transfer learning is not known. As shown in the figure below, transfer learning is far less popular than machine learning and deep learning.

How to train a deep learning model when there is not enough data?  Try transfer learning

  The basic idea of ​​transfer learning is to use a pre-trained model, that is, a model that has been trained with ready-made datasets (here, the pre-trained datasets can correspond to completely different problems to be solved, such as having the same input and different outputs). Developers need to find a layer in the pretrained model that can output reusable features, and then use the output of this layer as input features to train smaller neural networks that require fewer parameters. Since the pretrained model has previously learned the patterns of the data, this smaller network only needs to learn specific connections in the data for a specific problem. A previously popular photo editing app called Prisma is a good example. It has pre-learned Van Gogh's painting style and can successfully apply it to any user-uploaded image.

How to train a deep learning model when there is not enough data?  Try transfer learning

  It is worth mentioning that the advantages brought by transfer learning are not limited to reducing the scale of training data, but can also effectively avoid overfitting, that is, the modeling data exceeds the basic scope of the problem to be solved. If the system is tested with other samples, unforeseen errors are likely to occur. But because transfer learning allows the model to learn on different types of data, it is better at capturing the interconnectedness of the problem to be solved. As shown in the figure below, the model using the transfer learning technique performs better overall.

How to train a deep learning model when there is not enough data?  Try transfer learning

  How much training data can transfer learning reduce?

How to train a deep learning model when there is not enough data?  Try transfer learning

  Here is an example of a dress picture that was popular on the Internet. As shown in the figure, if you want to use deep learning to determine whether the skirt is blue-black striped or platinum striped, you must collect a large amount of image data of skirts containing blue-black stripes or platinum stripes. Referring to the correspondence between the problem scale and the parameter scale mentioned above, establishing such an accurate image recognition model requires at least 140M parameters and 1.2M pieces of relevant image training data, which is almost an impossible task.

  Now introducing transfer learning, the number of parameters required by this model in transfer learning can be obtained by the following formula:

  No. of parameters = [Size (inputs) + 1] * [Size (outputs) + 1] = [2048+1]*[1+1]~ 4098 parameters

  It can be seen that, through the introduction of transfer learning, the number of parameters for the same problem has been reduced from 140M to 4098, a reduction of 10 to the 5th order of magnitude! This reduction in parameters and training data is staggering.

  A concrete implementation example of transfer learning

  In this example, we need to use deep learning technology to perform textual orientation analysis on short movie reviews, such as "It was great, loved it." For positive reviews, "It was really stupid." For negative reviews.

  Suppose that there are only 72 pieces of data available now, 62 of which are not pre-labeled for pre-training. 8 were pre-labeled for propensity and used to train the model. 2 were also pre-biased to test the model.

  Since we only have 8 pre-labeled training data, if we directly train the model with this amount of data, the final test accuracy will undoubtedly be very low. (Because the judgment results are only positive and negative, it is foreseeable that the final test accuracy rate may only be 50%)

  To solve this dilemma, we introduce transfer learning. That is, first use 62 pieces of unlabeled data to make general emotional judgments on the model, and then analyze the specific problems of this example on the basis of this pre-training, and reuse some of the layers in the pre-training model. The test accuracy rate increased to 100%. The analysis will be carried out in three steps as follows.

  step 1

  Create pretrained models to analyze word-to-word relationships. Here we try to predict other words that appear in the same sentence by analyzing one word in an unlabeled sentence.

How to train a deep learning model when there is not enough data?  Try transfer learning

  Step 2

  The model is trained so that words that appear in similar contexts get similar vector representations. In this step, the 62 pending statements are first removed with stop words and flagged for interpretation. Then, for each word, the system tries to reduce the difference between its vector representation and related words and increase its difference from irrelevant words.

How to train a deep learning model when there is not enough data?  Try transfer learning

  Step 3

  Predict the textual propensity of a sentence. Since we have obtained vector representations for all words in the previous pre-training model, and these vectors have the contextual properties of each word represented by numbers, this will make text orientation analysis easier to implement.

How to train a deep learning model when there is not enough data?  Try transfer learning

  It should be noted that instead of directly using 10 sentences that have been pre-labeled, the vector of the sentence is first set as the average of all words (in the actual task, we will use a correlation similar to the temporal recurrent neural network LSTM). principle). In this way, the averaged sentence vector will be fed into the model as input data, and the positive or negative judgment of the sentence will be output as the result. It should be emphasized that here we add a hidden layer between the pre-trained model and the 10 pre-labeled sentences to adapt to the specific scenario of text orientation analysis. As you can see, 100% prediction accuracy is achieved here with only 10 markers.

  Of course, it must be pointed out that what is shown here is just a very simple model, and there are only 2 test cases. But it is undeniable that due to the introduction of transfer learning, the accuracy of text orientation prediction in this example has indeed increased from 50% to 100%.

  For the complete code of this example, see the following link: https://gist.github.com/prats226/9fffe8ba08e378e3d027610921c51a78

  Difficulties in implementing transfer learning

  Although the introduction of transfer learning can significantly reduce the amount of training data required by the model, it also means more specialized training. As you can see from the above example, just considering the sheer number of parameters that must be hardcoded to implement, and the tedious debugging process around them, is daunting enough. And this is also one of the important obstacles to the further promotion of transfer learning in practical applications. Here we summarize 8 common implementation difficulties of transfer learning.

1. Obtain a relatively large-scale pre-training data

2. Choose a suitable pre-trained model

3. Difficulty troubleshooting which model is not working

4. Not knowing how much additional data is needed to train the model

5. Difficulty judging when to stop pre-training

6. Determine the layers and parameters of the pre-trained model

7. Proxying and Serving the Composition Model

8. Pre-trained models are difficult to update when more data or better algorithms become available

  NanoNets Tools

  NanoNets is a simple and convenient cloud-based transfer learning tool. It contains a set of pre-trained models that have been implemented, and each model has millions of trained parameters. Users can upload data by themselves or search on the Internet, NanoNets will automatically select the best pre-training model according to the problem to be solved, build a NanoNets (nano network) based on the model, and adapt it to the user's data. The relationship structure between NanoNets and pretrained models is shown below.

How to train a deep learning model when there is not enough data?  Try transfer learning

  Taking the blue and black striped or platinum striped dress mentioned above as an example, the user only needs to select the name to be classified, and then upload or search the training data on the Internet. After that, NanoNets will automatically adapt the pre-training model and generate it for testing. web pages and API interfaces for further development. As shown below, the figure shows the analysis results given by the system based on a picture of a dress.

How to train a deep learning model when there is not enough data?  Try transfer learning

  For specific usage, please refer to NanoNets official website: http://nanonets.ai/  . It is worth mentioning that due to the promotion period, the API interface of NanoNets will be open for free before March 1, and interested partners may wish to try it.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324736624&siteId=291194637