Machine Learning Notes (X) large-scale machine learning

Lage scale machine learnning (large-scale machine learning)

1, to determine whether it is necessary to expand the amount of data?

Using the learning curve is observed with increasing m, and test whether the error has decreased significantly, as the case of high variance (overfitting).

 

2, Stochastic gradient descent (stochastic gradient descent):

(1) Background issues:

When the data is too large, one-time can not be completely read the data, not complete sum of all data in the summation of the time.

 

(2) Comparative batch gradient descent (gradient descent batch):

① algorithm difference:

among them:

The outer loop generally 1--10 times.

② process fitting difference:

Batch gradient descent: close to a straight line close to a global minimum;

Stochastic gradient descent: hover close to the global minimum roundabout.

 

(3) Check convergence:

Iterative calculation once every 1,000 times the average cost of rendering an image, the image may be the presence of noise, but you can judge the overall downward trend is convergence.

The figure below the red line learning rate is smaller, the shock will be even smaller:

The number of iterations if the red line is more iterations to 5000 average cost is calculated once, smoother (provided that the total amount of data is very large):

If the total amount of data is too small, the number of iterations interval set too large, noise will be great, the downward trend is not obvious, as shown below:

The figure below shows the divergence algorithm, you can try a smaller learning rate:

 

 

3, Mini-Batch gradient descent (small batch gradient descent):

Batch gradient descent using m samples each iteration, each iteration using stochastic gradient descent one sample.

The small quantities of each iteration using gradient descent b samples.

b is a "mini-batch" parameter, typically ranging 2--100.

 

4, Online learning (e-learning):

Illustrate: Suppose in the freight system, a client bid x goods vehicles from A shipped to B, trucks can choose to accept (y = 1) and rejected (y = 0) based on p (y = 1 | x; θ), to optimizing price x.

Where y is calculated and the need to provide θ x, θ calculated at the same time also requires x and y. General questions to separate training and test sets, conversion on-line learning test sample as a new sample also join the training set.

Repeat forever{

  Get (x, y) corresponding to user;

  Update θ using (x, y):

    θj := θj - α (hθ(x) - y) x; (j = 0, 1, ..., n)

}

在线学习的效果:对正在变化的用户进行调适.

 

5、Map reduce(映射约减):

(1)原理:

假设存在 m 个训练样本,将训练集等分成 c 份(这里假设是4个).将训练样本的子集送到 c 台不同计算机,每一台对 m/c 个数据进行求和运算,再送到中心计算服务器:

Combine: 求和部分转为四个 temp的累加.

 

(2)注意点:

① 若不考虑网络延迟和传输时间,算法可以得到四倍加速.

② 当主要运算量在于训练样本的求和时,可以考虑使用映射化简技术.

 

Guess you like

Origin www.cnblogs.com/orangecyh/p/11772130.html