The label is smooth, and the author says that the one-hot impulse label leads to overfitting
new_labels = (1.0 - label_smoothing) * one_hot_labels + label_smoothing / num_classes
When Szegedy implements the network, let label_smoothing = 0.1, num_classes = 1000. Label smooth improves network accuracy by 0.2%