1. Adam learned the true rate of 0.00035 incense;
2. SGD + Momentum learning rate should find a suitable interval, usually much larger than Adam;
3. The early termination, to prevent over-fitting;
4. Ensemble can significantly improve the performance of the model, for both models, appropriate to increase the better performance of the right weight models may get better results;