Pesudo Labels as the name suggests, is the method of assigning samples labels which by-default are unlabeled.
The image above clearly shows how to perform PseudoLabeling. Train the model using the labeled dataset. Using the trained model, predict on a batch of unlabeled data. Predicted labels can be used to calculate loss on unlabeled data. Combine the labeled data loss with unlabeled data loss & backpropagate.
However, loss calculation is not equally treated. A weight factor is introduced. Weight factor controls the contribution of the pseudolabels in the overall loss. This allows the model to focus more on the labeled data initially when the performance of the classifier can be bad. As the model’s performance increases over time (epochs), the weight increases and the unlabeled loss has more emphasis on the overall loss.
The alpha value or the weight factor as proposed by Lee changes as per the following equation ->
Why does it work?
Clusters -> Every dataset can be represented in form of clusters. And points which are in proximity to a cluster, will share the same label to that cluster.
Smoothness -> Small changes in the images doesn't cause change in the output.
Implementation >
Train for first few epochs.
For next few epochs, train on the unlabeled data. After every 50 batches, train one epoch of the labeled data (acts as a correcting factor).
Flowchart ->
When can PseudoLabeling fail?
Initial Labeled Data is not enough to form clusters
Initial Labeled Data doesn't include all the classes
No benefit from increased data. (this might be the case of a simple Model. Large Models always benefit from increase of data.)
Comments