"Neural Networks and Deep Learning" Learning Summary

The content of this article is a summary of the study after studying the book "Neural Networks and Deep Learning" .

a subfield of artificial intelligence

Neural network: a model with (artificial) neurons as its basic unit

Deep learning: a type of machine learning problem that mainly solves the problem of (module) contribution allocation


machine learning


Shallow learning: does not involve feature learning, and its features are mainly extracted by artificial experience or feature conversion methods.

Representation learning: learning high-level semantic features through deep models

Deep learning = representation learning + decision learning (prediction) to solve the contribution distribution problem is to use neural networks (continuous functions to obtain partial derivatives)

Common types of machine learning

Supervised learning: regression, classification

Unsupervised learning: clustering, dimensionality reduction, density estimation

reinforcement learning

Four elements of machine learning

Data, models, learning criteria , optimization algorithms

​Loss function: a non-negative real function that quantifies the difference between model predictions and true labels.
Expected risk, empirical risk

Gradient descent method for optimization problems

Linear classifier

Logistic regression Softmax regression perceptron Support vector machine in supervised learning...

Neural Networks

Three elements: neuron activation rules, network topology, and learning algorithms.


Common activation functions: S-shaped function (Logistic function), slope function (ReLU function), composite function (Swish function)

Feedforward neural network (fully connected neural network, multi-layer perceptron)

  • Each neuron belongs to a different layer, and there is no connection within the layer.

  • All neurons between two adjacent layers are connected in pairs

  • There is no feedback in the entire network, and the signal propagates in one direction from the input layer to the output layer, which can be represented by a directed acyclic graph.

    Gradient calculation method: chain derivation, backpropagation algorithm, automatic differentiation

convolutional neural network

  • A feedforward neural network
  • Proposed by the biological receptive field mechanism (in the visual nervous system, the receptive field of a neuron refers to a specific area on the retina, and only stimulation in this area can activate the neuron)
  • Local connection, weight sharing

Use convolutional layers instead of fully connected layers.

Convolutional networks are cross-stacked by convolutional layers, pooling layers, and fully connected layers.

Typical convolutional neural networks: AlexNet, GoogLeNet, ResNet

recurrent neural network


Recurrent neural networks can process time series data of any length by using neurons with self-feedback.

Recurrent neural networks are more consistent with the structure of biological neural networks than feedforward neural networks, and have been widely used in tasks such as speech recognition, language models, and natural language generation.

Gated RNN

Gating mechanism: Controls the speed of accumulation of information, including selectively adding new information and selectively forgetting previously accumulated information.

Gated recurrent unit GRU long short-term memory network LSTM

Recurrent neural networks are mainly used in language models: natural language understanding, machine translation, writing, dialogue systems, etc.

Attention mechanism and external memory

attention mechanism

Score the input information, find the attention distribution, and select the input information

self-attention model according to probability (the connection weight is dynamically generated by the attention mechanism)

external memory

Memory enhancement neural network: Add an external memory unit to the main network

to structure the external memory and associative memory based on neural dynamics (Hopfield network)

unsupervised learning

clustering

Assign similar samples in the sample set to the same class/cluster, and assign dissimilar samples to different classes/clusters, so that the distance between samples within a class is small and the distance between samples between classes is large.

Common tasks: image segmentation, text clustering, social network analysis Common clustering methods: K-means clustering, hierarchical clustering, density clustering

(Unsupervised) Feature Learning

Learn useful features from unlabeled data (feature extraction, denoising, dimensionality reduction, data visualization)

Principal component analysis (commonly used dimensionality reduction method), sparse coding, autoencoder, self-supervised learning

Probability Density Estimation

Parameter density estimation: eradicate prior knowledge and assume that random variables obey a certain distribution, and then estimate the parameters of the distribution through training samples

Non-parametric density estimation: Without assuming that the data obeys a certain distribution, the probability density function of the data is approximated by dividing the sample space into different regions and estimating the probability of each region.

Histogram method, kernel density estimation, K nearest neighbor method

semi-supervised learning

Self-training

collaborative training

Guess you like

Origin blog.csdn.net/qq_43570515/article/details/130102533