Machine learning classic paper-reading notes-deep learning (Zhou Zhihua)

Classic English paper notes————Deep Forest Zhou Zhihua

Insert picture description here

Summary

Insert picture description here
Personal translation:
Most of the current deep learning models are based on neural networks, for example: parameterized and differentiable multi-layer nonlinear models that can be trained by backpropagation algorithms. In this article, we explored a solution to build a deep model based on a non-differentiable model of decision tree. This article discusses the secrets behind deep neural networks, especially after comparing deep neural networks with traditional machine learning techniques such as basic neural networks, decision trees and lifting algorithms, we speculate that the success of deep neural networks can be attributed to three aspects: first, Layer-to-layer processing; second, feature transfer within the model; third, sufficient model complexity. On the one hand, our conjecture may help the understanding of the deep learning field; on the other hand, in order to confirm our conjecture, we propose a deep forest algorithm that contains the above three characteristics. Experiments show that the performance of this algorithm is more robust to the setting of hyperparameters. In many cases in different fields and data sets, we found that this algorithm can achieve good performance using default parameters. This research opens the door to the field of deep learning models that can be built without gradient calculations, and shows the possibility of building deep learning models without backpropagation algorithms.

introduction

Insert picture description here

The first paragraph of the introduction introduces an overview of DNN (Deep Neural Network).

Turning point --- but DNN has many shortcomings --- Start to list the shortcomings: personal summary-the shortcomings of DNN
Insert picture description hereInsert picture description here
Insert picture description here
:

  1. There are too many hyperparameters, the training design is very skillful, and there are too many influencing factors inside the model that are difficult to analyze.
  2. Need a large number of labeled training data sets (excessive cost)
  3. The model has been fixed before the training starts. There are some problems that do not require such a complicated model, which wastes computing power.
  4. Neural network is a black box. We currently cannot know why the training results of neural network are inferior to random forest algorithm and XGboost on some problems.

Insert picture description here
Contradictory points are drawn-in order to improve the accuracy rate, we must design the model more deeply, but the current algorithms are all based on gradients.

Insert picture description here
Three questions are raised:
1. Is it necessary to use differential methods to explore more in-depth models?
2. Is it feasible without backpropagation algorithm?
3. How do we apply deep learning algorithms to win on some issues where XBboost and RF algorithms perform better?

Insert picture description here
Explain the relevance of the results of that article in 2017 to this article.
Insert picture description here
Concisely introduce what each part of the article is doing?

1. Integrated learning

The first paragraph outlines the integration relationship between weak classifiers and strong classifiers. The
second paragraph writes mathematical expressions and how to describe diversity. The
third paragraph writes four solutions in practical applications.

  1. Bagging strategy (sampling)
  2. Partition feature subspace
  3. Design different initial learning parameters
  4. Use different output expressions for different classifiers such as ECOC

Second, the key to deep learning

It is demonstrated why layer-to-layer processing is the key to deep learning. This discovery provides GCforest with ideas.
At the same time, the comparison found that the complexity of the model is also very important.

Three, gcForest algorithm

1. Cascade structure
Insert picture description here
Insert picture description here

Translated picture description:
The structure of the cascade forest. Assuming that each level of cascade includes two complete random forests and two incomplete random forests, assuming that there are three categories that can be classified; therefore, each forest will output a 3-dimensional vector. Then these vectors are cascaded with the input feature vector.

Explain the completely random forest and the incomplete random forest:
Insert picture description here
randomly assign features at each node, and then continue to grow until the leaf node, this kind of tree is called a completely random tree.
n completely random trees form a completely random forest.

Input d features, randomly select the root d features as candidate features, and then use the split algorithm based on Gini coefficient to generate an incomplete random tree. n incomplete random trees form an incomplete random forest.

Where n is a hyperparameter of the Gcforest algorithm.

For example:
Insert picture description here
the shapes of different leaf nodes (triangle, circle, square) indicate different categories. The red line represents the path.
Insert picture description here
For each tree, count the percentages of the three categories in the leaf nodes and summarize them into the forest, so that each forest can give an Insert picture description hereevaluation result of this kind .

This three-dimensional assessment result is cascaded with other forest assessment results as an enhanced feature vector, and then sent to the next layer of the network.
Determine the number of cascaded layers through k-fold cross validation. This is better than DNN.

Multi-granularity scan
Insert picture description here

Inspired by CNN and RNN, use the same feature extraction method.

Hyperparameters (compared with CNN):Insert picture description here

Guess you like

Origin blog.csdn.net/qq_42138927/article/details/105773138
Recommended