CLIP has become a promising language-supervised visual pre-training framework and achieves excellent performance over a wide range of tasks.
Knowledge Distillation (KD) aims to optimize a lightweight network from the perspective of over-parameterized training.
In this report, we describe the technical details of our submission to the EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023, by Team "AcieLee" (username: Yuqi\_Li).
Most successful CIL methods incrementally train a feature extractor with the aid of stored exemplars, or estimate the feature distribution with the stored prototypes.
Vision-Language Pretraining (VLP) has significantly improved the performance of various vision-language tasks with the matching of images and texts.
MixSKD mutually distills feature maps and probability distributions between the random pair of original images and their mixup images in a meaningful way.
This enables each network to learn extra contrastive knowledge from others, leading to better feature representations, thus improving the performance of visual recognition tasks.
Current Knowledge Distillation (KD) methods for semantic segmentation often guide the student to mimic the teacher's structured information generated from individual data samples.
In detail, the proposed PGMPF selectively suppresses the gradient of those ”unimportant” parameters via a prior gradient mask generated by the pruning criterion during fine-tuning.
Each auxiliary branch is guided to learn self-supervision augmented task and distill this distribution from teacher to student.
We therefore adopt an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and self-supervised auxiliary task.
Ranked #18 on Knowledge Distillation on ImageNet
We present a collaborative learning method called Mutual Contrastive Learning (MCL) for general visual representation learning.
We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks.
Previous Online Knowledge Distillation (OKD) often carries out mutually exchanging probability distributions, but neglects the useful representational knowledge.
Deep convolutional neural networks (CNN) always depend on wider receptive field (RF) and more complex non-linearity to achieve state-of-the-art performance, while suffering the increased difficult to interpret how relevant patches contribute the final prediction.
For VGG16 pre-trained on ImageNet, our method averagely gains 14. 29\% accuracy promotion for two-classes sub-tasks.
In this paper, we propose a method for efficient automatic architecture search which is special to the widths of networks instead of the connections of neural architecture.
We propose a simple yet effective method to reduce the redundancy of DenseNet by substantially decreasing the number of stacked modules by replacing the original bottleneck by our SMG module, which is augmented by local residual.
Ranked #60 on Image Classification on CIFAR-10
In this work, we propose a heuristic genetic algorithm (GA) for pruning convolutional neural networks (CNNs) according to the multi-objective trade-off among error, computation and sparsity.
Latest algorithms for automatic neural architecture search perform remarkable but are basically directionless in search space and computational expensive in training of every intermediate architecture.