"Neural Networks and Deep Learning" Learning Summary
The content of this article is a summary of the study after studying the book "Neural Networks and Deep Learning" .
a subfield of artificial intelligence
Neural network: a model with (artificial) neurons as its basic unit
Deep learning: a type of machine learning problem that mainly solves the problem of (module) contribution allocation
machine learning
Shallow learning: does not involve feature learning, and its features are mainly extracted by artificial experience or feature conversion methods.
Representation learning: learning high-level semantic features through deep models
Deep learning = representation learning + decision learning (prediction) to solve the contribution distribution problem is to use neural networks (continuous functions to obtain partial derivatives)
Common types of machine learning
Supervised learning: regression, classification
Unsupervised learning: clustering, dimensionality reduction, density estimation
reinforcement learning
Four elements of machine learning
Data, models, learning criteria , optimization algorithms
Loss function: a non-negative real function that quantifies the difference between model predictions and true labels.
Expected risk, empirical risk
Gradient descent method for optimization problems
Linear classifier
Logistic regression Softmax regression perceptron Support vector machine in supervised learning...
Neural Networks
Three elements: neuron activation rules, network topology, and learning algorithms.
Common activation functions: S-shaped function (Logistic function), slope function (ReLU function), composite function (Swish function)
Feedforward neural network (fully connected neural network, multi-layer perceptron)
-
Each neuron belongs to a different layer, and there is no connection within the layer.
-
All neurons between two adjacent layers are connected in pairs
-
There is no feedback in the entire network, and the signal propagates in one direction from the input layer to the output layer, which can be represented by a directed acyclic graph.
Gradient calculation method: chain derivation, backpropagation algorithm, automatic differentiation
convolutional neural network
- A feedforward neural network
- Proposed by the biological receptive field mechanism (in the visual nervous system, the receptive field of a neuron refers to a specific area on the retina, and only stimulation in this area can activate the neuron)
- Local connection, weight sharing
Use convolutional layers instead of fully connected layers.
Convolutional networks are cross-stacked by convolutional layers, pooling layers, and fully connected layers.
Typical convolutional neural networks: AlexNet, GoogLeNet, ResNet
recurrent neural network
Recurrent neural networks can process time series data of any length by using neurons with self-feedback.
Recurrent neural networks are more consistent with the structure of biological neural networks than feedforward neural networks, and have been widely used in tasks such as speech recognition, language models, and natural language generation.
Gated RNN
Gating mechanism: Controls the speed of accumulation of information, including selectively adding new information and selectively forgetting previously accumulated information.
Gated recurrent unit GRU long short-term memory network LSTM
Recurrent neural networks are mainly used in language models: natural language understanding, machine translation, writing, dialogue systems, etc.
Attention mechanism and external memory
attention mechanism
Score the input information, find the attention distribution, and select the input information
self-attention model according to probability (the connection weight is dynamically generated by the attention mechanism)
external memory
Memory enhancement neural network: Add an external memory unit to the main network
to structure the external memory and associative memory based on neural dynamics (Hopfield network)
unsupervised learning
clustering
Assign similar samples in the sample set to the same class/cluster, and assign dissimilar samples to different classes/clusters, so that the distance between samples within a class is small and the distance between samples between classes is large.
Common tasks: image segmentation, text clustering, social network analysis Common clustering methods: K-means clustering, hierarchical clustering, density clustering
(Unsupervised) Feature Learning
Learn useful features from unlabeled data (feature extraction, denoising, dimensionality reduction, data visualization)
Principal component analysis (commonly used dimensionality reduction method), sparse coding, autoencoder, self-supervised learning
Probability Density Estimation
Parameter density estimation: eradicate prior knowledge and assume that random variables obey a certain distribution, and then estimate the parameters of the distribution through training samples
Non-parametric density estimation: Without assuming that the data obeys a certain distribution, the probability density function of the data is approximated by dividing the sample space into different regions and estimating the probability of each region.
Histogram method, kernel density estimation, K nearest neighbor method
semi-supervised learning
Self-training
collaborative training