While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets.
Ranked #1 on Environmental Sound Classification on UrbanSound8K (using extra training data)
Encouraged by the recent transferability results of self-supervised models, we propose a method that combines self-supervised and supervised pretraining to generate models with both high diversity and high accuracy, and as a result high transferability.
The scheme, named USI (Unified Scheme for ImageNet), is based on knowledge distillation and modern tricks.
In this paper, we introduce ML-Decoder, a new attention-based classification head.
Ranked #1 on Fine-Grained Image Classification on Stanford Cars (using extra training data)
Practical use of neural networks often involves requirements on latency, energy and memory among others.
We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset's partial annotations.
Ranked #1 on Multi-Label Classification on OpenImages-v6
We address this gap with a tailor-made solution, combining the power of CNNs for image representation and transformers for album representation to perform global reasoning on image collection, offering a practical and efficient solution for photo albums event recognition.
ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks.
Ranked #2 on Image Classification on Stanford Cars
Methods that reach State of the Art (SotA) accuracy, usually make use of 3D convolution layers as a way to abstract the temporal information from video frames.
Ranked #18 on Action Recognition on UCF101 (using extra training data)
Realistic use of neural networks often requires adhering to multiple constraints on latency, energy and memory among others.
Ranked #20 on Neural Architecture Search on ImageNet
We show that convergence to a global minimum is guaranteed for networks with widths quadratic in the sample size and linear in their depth at a time logarithmic in both.
To this end, we develop two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation.
In this paper, we introduce a novel asymmetric loss ("ASL"), which operates differently on positive and negative samples.
Ranked #4 on Multi-Label Classification on NUS-WIDE
In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency.
Ranked #5 on Fine-Grained Image Classification on Oxford 102 Flowers (using extra training data)
Through this we produce compact architectures with the same FLOPs as EfficientNet-B0 and MobileNetV3 but with higher accuracy, by $1\%$ and $0. 3\%$ respectively on ImageNet, and faster runtime on GPU.
Ranked #3 on Network Pruning on ImageNet
This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice.
In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations.