71 papers with code • 32 benchmarks • 26 datasets
The Fine-Grained Image Classification task focuses on differentiating between hard-to-distinguish object classes, such as species of birds, flowers, or animals; and identifying the makes or models of vehicles.
( Image credit: Looking for the Devil in the Details )
Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Ranked #2 on Semantic Object Interaction Classification on VLOG
In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch.
Ranked #5 on Fine-Grained Image Classification on Caltech-101
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.
Ranked #1 on Image Classification on Tiny ImageNet Classification (using extra training data)
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
Ranked #5 on Image Classification on ImageNet V2 (using extra training data)
In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency.
Ranked #6 on Fine-Grained Image Classification on Oxford 102 Flowers (using extra training data)
Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available.
Ranked #2 on Fine-Grained Image Classification on Birdsnap (using extra training data)
In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT).
In this work, we produce a competitive convolution-free transformer by training on Imagenet only.
Ranked #2 on Image Classification on iNaturalist 2018
Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures.
Ranked #1 on Domain Generalization on ImageNet-R (Top 1 Accuracy metric)
Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks.
Ranked #4 on Fine-Grained Image Classification on Birdsnap (using extra training data)