ELU activation function
- Fusion and RELU sigmoid, left soft-saturation, right no unsaturation.
- ELU linear portion such that the right side can be alleviated gradient disappears, while the left soft ELU enables a more robust saturation change of the input or noise.
- ELU Mean output is approximately zero, therefore faster convergence.
- On ImageNet, without Batch Normalization more than 30 layers ReLU network will not converge, PReLU network in the MSRA's Fan-in (caffe) initialization will diverge, and ELU network can converge in the Fan-in / Fan-out.