We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data.
We show that the existing approaches either do not scale to this dataset or underperform compared to the simple baseline of training a model on the union of data from all training domains.
However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions.
Our key idea is to decorrelate feature representations of a category from its co-occurring context.
We also investigate the interplay between dataset granularity with a variety of factors and find that fine-grained datasets are more difficult to learn from, more difficult to transfer to, more difficult to perform few-shot learning with, and more vulnerable to adversarial attacks.
Blind or no-reference (NR) perceptual picture quality prediction is a difficult, unsolved problem of great consequence to the social and streaming media industries that impacts billions of viewers daily.
Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks.
Ranked #27 on Image Classification on iNaturalist 2018
To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.
Ranked #1 on Self-Supervised Action Recognition on UCF101
Self-supervised learning aims to learn representations from the data itself without explicit manual supervision.
This paper presents a study of semi-supervised learning with large convolutional networks.
Ranked #85 on Image Classification on ImageNet (using extra training data)
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Ranked #1 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)
Weakly supervised object detection aims at reducing the amount of supervision required to train detection models.
Ranked #1 on Weakly Supervised Object Detection on Charades
Empirical evaluations of this defense strategy on ImageNet suggest that it is very effective in attack settings in which the adversary does not have access to the image database.
The ability to capture temporal information has been critical to the development of video understanding models.
ImageNet classification is the de facto pretraining task for these models.
Ranked #63 on Image Classification on ImageNet (using extra training data)
First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines.
We also demonstrate their usefulness in making design choices such as the number of classifiers in the ensemble and the size of a subset of data used for training that is needed to achieve a certain value of generalization error.
In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse.
We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset.
Current solutions to learning from geo-distributed data sources revolve around the idea of first centralizing the data in one data center, and then training locally.
In this paper we design a distributed algorithm for $l_1$ regularization that is much better suited for such systems than existing algorithms.
In this paper we give a novel approach to the distributed training of linear classifiers (involving smooth losses and L2 regularization) that is designed to reduce the total communication costs.