Self-Knowledge Distillation
35 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Self-Knowledge Distillation
Most implemented papers
ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks
Keywords: entropy minimisation, maximum entropy, confidence penalty, self knowledge distillation, label correction, label noise, semi-supervised learning, output regularisation
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval, which provides a unified model foundation for real-world IR applications.
Preservation of the Global Knowledge by Not-True Distillation in Federated Learning
In federated learning, a strong global model is collaboratively learned by aggregating clients' locally trained models.
Revisiting Knowledge Distillation via Label Smoothing Regularization
Without any extra computation cost, Tf-KD achieves up to 0. 65\% improvement on ImageNet over well-established baseline models, which is superior to label smoothing regularization.
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
FedSOL is designed to identify gradients of local objectives that are inherently orthogonal to directions affecting the proximal objective.
Regularizing Class-wise Predictions via Self-knowledge Distillation
Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting.
Self-Knowledge Distillation with Progressive Refinement of Targets
Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself.
Noisy Self-Knowledge Distillation for Text Summarization
In this paper we apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training on single reference and noisy datasets.
Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation
Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures.
Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation
Knowledge distillation is a method of transferring the knowledge from a pretrained complex teacher model to a student model, so a smaller network can replace a large teacher network at the deployment stage.