Label Smoothing is one of the many regularization techniques
Formula of Label Smoothing -> y_ls = (1 - a) * y_hot + a / k k -> number of classes a -> hyper-parameter which controls the extent of label smoothing a - 0 (original distribution) ; a - 1 (uniform distribution)
When to use? Usually used when your data has mislabeling. This causes to either learn the noise or learn incorrect features. Two ways to handle this situation, either go back and relabel the entire dataset. Or use a mathematical approach.
For an image with label as 1 and prediction as 1, the loss calculated would be extremely low. However, if there's a mislabeling, the loss value would shoot up. If you check the Loss function for image classification, L = -(ylog(p) + (1-y)log(1-p))
The calculation is made by measuring the deviation from expected target or label values which is 1 & 0 (in case of binary classification). However, if we lower down the upper limit & increase the lower limit, results would be the same. And it would either be no longer affected or will be immune to mislabeling to a great extent.
Label Smoothing will help you train your model with more robustness & improve performance even in the presence of noisy data.
Label smoothing is pre-implemented in Tensorflow. You just need to pass the correct parameter value.
Comments