### Article directory

# Definition of Artificial Neural Network

Artificial Neural Network (English: Artificial Neural Network, ANN), referred to as Neural Network (Neural Network, NN) or neural network, is a structure and function imitating biological neural network (animal's central nervous system, especially the brain). A mathematical model used to estimate or approximate a function.

ps: Like other machine learning methods, neural networks have been used to solve a variety of problems, such as machine vision, natural language processing, and multimodality. **These problems are difficult to be solved by traditional rule-based programming, and it is also where neural networks show great promise**

# neuron definition

**In the biological neural network,**

each neuron is connected to other neurons. When it is "excited", it will send chemicals to the connected neurons, thereby changing the potential in these neurons; if the potential of a neuron exceeds one threshold," then it gets activated, or "fires," sending chemicals to other neurons.**Artificial Neural Network**

In 1943, McCulloch and Pitts abstracted the above situation into the simple model shown in the figure above, which is the**MP neuron model that**has been used to this day . An artificial neural network is obtained by connecting many such neurons in a certain hierarchical structure.

# function of neurons

After the input vector X comes in, perform an inner product operation with the transposition of the weight vector W to obtain a scalar, add a bias term b, and finally pass through a nonlinear activation function f to obtain an output scalar y.

Expressed as:

$y=f(W_{T}X+b)$

ps: where$W_{T}X$(matrix multiplication with X after W transpose) can also be written as$W⋅X$ (inner product/dot product of W and X), but do not write$WX$

Expressed graphically as:

# Perceptron

**A single neuron can also be called a perceptron**- The perceptron is a linear model of binary classification. Its input is the feature vector of the instance, and the output is the category of the instance, which are +1 and -1 respectively, which belong to the discriminative model.
- Assuming that the training data set is linearly separable, the goal of perceptron learning is to obtain a separating hyperplane that can completely and correctly separate the positive instance points and negative instance points of the training data
**set**. If it is non-linearly separable data, the hyperplane cannot be obtained in the end.

ps：

- Perceptron is a
**supervised**learning algorithm - The perceptron was proposed by Rosenblatt in 1957 and is the basis of
**neural networks**and**support vector machines** - The perceptron cannot solve the XOR (non-linear) problem, and all linear models cannot solve the XOR problem
**Hyperplane**: hyperplane can divide high-dimensional space, three-dimensional space is composed of planes, multi-dimensional space is composed of hyperplanes

# single layer neural network

It is the most basic form of neuron network, which is composed of a finite number of neurons, and the input vectors of all neurons are the same vector. Since each neuron produces a scalar result, the output of a single layer of neurons is a vector whose dimension is equal to the number of neurons.

ps: A single-layer neural network is equivalent to a row of perceptrons

# Multilayer Neural Network (Multilayer Perceptron MLP)

A multi-layer neural network is obtained by superimposing a single-layer neural network, so the concept of a layer is formed. A common multi-layer neural network has the following structure:

**Input layer**(Input layer), many neurons (Neuron) accept a large number of input messages. The incoming message is called the input vector.**Hidden layer**, referred to as "hidden layer", is a layer composed of many neurons and links between the input layer and the output layer. There can be one or more hidden layers. The number of nodes (neurons) in the hidden layer is variable, but the more the number, the more significant the nonlinearity of the neural network, and thus the more significant the robustness of the neural network.**Output layer**(Output layer), messages are transmitted, analyzed, and weighed in neuron links to form output results. The outputted messages are called output vectors.

ps：

- Since the neurons of the current layer and the neurons of the previous layer are each connected to each other, it is also called a
**fully connected neural network**, and this layer is called**a fully connected layer.** - The deeper the network layer, the better the approximation effect and the higher the training cost, but many problems do not require a particularly complicated network, so the depth and width of the network can be determined according to the situation (metaphysics)
- Each layer is actually passed through a matrix multiplication, so the speed is very fast (the neurons of the wide m layer will pass through an mxn W weight matrix when passing to the neurons of the wide n layer)

# activation function

Although multi-layer neural networks can handle XOR (non-linear) problems, it is not because of the large number of layers, but because of the nonlinear activation function connected to each neuron. It is conceivable that if the activation function is not used , then no matter how many layers the network has, the input of each layer of nodes is a linear function of the output of the upper layer, and the final output is a linear combination of the inputs, so the approximation ability of the network is quite limited. After **introducing the nonlinear activation function, the network can approach almost any function**

. At the same time, adding the activation function has three advantages:

- Improve model robustness (robustness)
- Alleviate the vanishing gradient problem (mitigate but not solve)
- Accelerate Model Convergence

Common activation functions are as follows,

the most common of which are **ReLU** and **Sigmoid**

, but Sigmoid is not used much now, because this activation function is slow **in** and will cause **gradient disappearance and gradient explosion**

ps：

- Sigmoid maps the number between negative infinity and positive infinity to 0-1 (the closer to infinity, the slower the change, binarization, and slow operation speed)
- ReLU maps the number between negative infinity and 0 to 0, and the number between 0 and positive infinity to the original value (discarding part of the noise, and the operation speed is fast)

# The definition of linear function and nonlinear function

A linear function is not as simple as y = ax+b, it needs to satisfy two points:

- $f(x_{1}+x_{2})=y_{1}+y_{2}$
- $f(kx)=ky$

Satisfying these two points is a linear function, otherwise it is a nonlinear function

# tensor

- Tensor of order zero:
**constant**(scaler) - First-order tensor:
**vector**(vector) - Second-order tensor:
**matrix**(matrix) - Tensors of order three and above are collectively referred to as
**order N tensors**