Neural Network DNN - Optimization

First, the weighted average of the index

 

Note: Before looking at new algorithms need to know the average weighted index, this is Momentum, RMSprop, Adam three basic optimization algorithms.

 

1, a weighted average index Introduction:

FIG There is a daily temperatures (deg.] F degrees Fahrenheit), the right is the daily temperatures, $ \ theta _ {i} $ denotes the i-th day temperature:

This time we use a curve fit to the scatter plot, the curve given day $ Y $ local mean value can be used in a day instead of the temperature, we have assumed that the temperature before the day i-1, this time to estimates of $ i temperature $ days $ \ theta _ {i} $, we can use the first $ i $ days before $ k average temperature $ days alternative, such as: $ \ theta _ {i} ^ { ' } = \ frac {\ theta _ {i-1} + ... + \ theta _ {ik}} {k} $.

But such data is likely to be a problem, as the value of the current five-day 10,11,12,13,14,30, you can see the fifth day of the anomalously large data, if calculated using generally mean it will lead to large fluctuations in particular , fitted values ​​and error-prone.

The solution is to affect the calculation of the mean, when we consider the previous day k, k of the previous day plus weights, will be able to offset the outliers cause excessive error data, which has an exponentially weighted average of the following formula:

$$ V_ {i} = \ beta V_ {i-1} + (1- \ beta) \ theta _ {i} $$

$ V_ {i} $ for the first $ i temperature $ days approximations predetermined $ V_ {i} = 0 $, $ \ theta _ {i} $ is the actual temperature of the day, $ \ Beta $ is a weight, is generally set 0.9, listed here calculate a first approximation of $ I $ days $ V_ {i} $ the formula:

\begin{matrix}
& V_{0}=0 \\&V_{1}=\beta V_{0}+(1-\beta )\theta _{1} \\&V_{2}=\beta V_{1}+(1-\beta )\theta _{2} \\& ... \\& V_{i}=\beta V_{i-1}+(1-\beta )\theta _{i}
\end{matrix}

and so:

\begin{align*}
V_{i} &= \beta V_{i-1}+(1-\beta )\theta _{i}\\
&= \beta \left (\beta V_{i-2}+(1-\beta )\theta _{i-1} \right )+(1-\beta )\theta _{i}\\
&= \beta^{2}V_{i-2}+\beta(1-\beta )\theta _{i-1}+1(1-\beta )\theta _{i}\\
&= \beta^{2}\left (\beta V_{i-3}+(1-\beta )\theta _{i-2} \right )+\beta(1-\beta )\theta _{i-1}+\beta^{0}(1-\beta )\theta _{i}\\
&= \beta^{3}V_{i-3}+\beta^{2}(1-\beta )\theta _{i-2}+\beta(1-\beta )\theta _{i-1}+\beta^{0}(1-\beta )\theta _{i}\\
&= \beta^{i}V_{0}+\beta^{i-1}(1-\beta )\theta_{1}+...+\beta^{0}(1-\beta )\theta _{i}\\
\end{align*}

\begin{matrix}
\because V_{0}=0\\
\therefore V_{i} = \beta^{i-1}(1-\beta )\theta_{1}+\beta^{i-2}(1-\beta )\theta_{2}+...+\beta^{0}(1-\beta )\theta _{i}=\sum_{k=1}^{i}\beta^{i-k}(1-\beta )\theta_{k}
\end{matrix}

Thus we get $ V_ {i} $ final equation solver out all the wires and then, can get a better fit red curve:

 

Why is this method will be called exponentially weighted average of it? We consider here the value $ V_ {i} $, and

 

Guess you like

Origin www.cnblogs.com/dwithy/p/11310278.html
Recommended