Basics of Neural Network
1. Binary classification——logistic regression
Logistic regression model:
- Problems: Given x, want \(\hat{y}=P \left\{y=1|x\right\} (0 \leq \hat{y} \leq 1)\)
- Parameters: \(\omega\in R^{n_x},b \in R\)
- Output: \(\hat{y}=sigmoid(\omega^Tx+b)\)
Training sets:(\(m\) is the number of training sets)
\(\left\{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…,(x^{(m)},y^{(m)})\right\}\)
the i_th sample input vector: \(x^{(i)}\) is a (\(n_x\),1) vector
the input matrix : \(X=[x^{(1)},x^{(2)},…,x^{(m)}]\)
the output matrix : \(Y=[y^{(1)},y^{(2)},…,y^{(m)}]\)
2.Cost function——to train parameters \(\omega\) and \(b\)
loss function : \(L(\hat y,y)=-(ylog(\hat y)+(1-y)log(1-\hat y))\)
cost function : \(J(\omega,b)=\frac{1}{m}\sum\limits_{k=1}^m{L(\hat y^{i},y^i)}=-\frac{1}{m}\sum\limits_{k=1}^m{[y^ilog \hat y^i +(1-y^i)log(1-log \hat y^i)}\)
Cost function is used to measure how well you're doing an entire training set.
Note : \(L(\hat y,y)=\frac{1}{2}{(\hat y-y)}^2\) this form is not a good choice, when using gradient descent, the problem will be non-convex.
3.Gradient descent——to make the cost function \(J\) as small as possible
Repeat : \(\omega =\omega - \alpha \frac{dJ(\omega)}{d\omega}\)
(\(\alpha\) is the learning rate)