- The document proposes a method called Class-wise Self-Knowledge Distillation (CS-KD) to improve the generalization of deep neural networks.
- CS-KD matches the predictive distributions of different samples from the same class, encouraging consistent predictions for samples of the same class.
- Experiments on image classification datasets show CS-KD reduces error rates compared to other regularization methods and self-distillation baselines. It also improves calibration and works well with other techniques.
- By minimizing the KL divergence between predictive distributions of same-class samples, CS-KD enhances generalization and calibration of neural networks.