Vision-language models trained with contrastive learning on large-scale noisy data are becoming increasingly popular for zero-shot recognition problems.
1 code implementation • 4 Jan 2023 • Vignesh Ramanathan, Anmol Kalia, Vladan Petrovic, Yi Wen, Baixue Zheng, Baishan Guo, Rui Wang, Aaron Marquez, Rama Kovvuri, Abhishek Kadian, Amir Mousavi, Yiwen Song, Abhimanyu Dubey, Dhruv Mahajan
This motivates the need for large datasets which go beyond traditional object masks and provide richer annotations such as part masks and attributes.
We demonstrate by human subject evaluations that SPAMs are demonstrably more interpretable in practice, and are hence an effortless replacement for DNNs for creating interpretable and high-performance systems suitable for large-scale machine learning.
However, these models are typically black-box deep neural networks, explained post-hoc via methods with known faithfulness limitations.
A visual counterfactual explanation replaces image regions in a query image with regions from a distractor image such that the system's decision on the transformed image changes to the distractor class.
Model pre-training is a cornerstone of modern visual recognition systems.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (using extra training data)
In this paper, we propose a domain-adaptive approach to this problem, which operates in two steps: (a) we cluster training data within a carefully chosen feature space to create pseudo-domains, and (b) using these pseudo-domains we learn a domain-adaptive classifier that makes predictions using information about both the input and the pseudo-domain it belongs to.
Ranked #8 on Domain Generalization on PACS
We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data.
We show that the existing approaches either do not scale to this dataset or underperform compared to the simple baseline of training a model on the union of data from all training domains.
However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions.
State-of-the-art object detection approaches typically rely on pre-trained classification models to achieve better performance and faster convergence.
Our key idea is to decorrelate feature representations of a category from its co-occurring context.
We also investigate the interplay between dataset granularity with a variety of factors and find that fine-grained datasets are more difficult to learn from, more difficult to transfer to, more difficult to perform few-shot learning with, and more vulnerable to adversarial attacks.
Blind or no-reference (NR) perceptual picture quality prediction is a difficult, unsolved problem of great consequence to the social and streaming media industries that impacts billions of viewers daily.
Ranked #11 on Video Quality Assessment on MSU NR VQA Benchmark
Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks.
Ranked #49 on Image Classification on iNaturalist 2018
To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.
Self-supervised learning aims to learn representations from the data itself without explicit manual supervision.
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)
This paper presents a study of semi-supervised learning with large convolutional networks.
Ranked #7 on Image Classification on OmniBenchmark (using extra training data)
Weakly supervised object detection aims at reducing the amount of supervision required to train detection models.
Ranked #1 on Weakly Supervised Object Detection on Charades
Empirical evaluations of this defense strategy on ImageNet suggest that it is very effective in attack settings in which the adversary does not have access to the image database.
The ability to capture temporal information has been critical to the development of video understanding models.
ImageNet classification is the de facto pretraining task for these models.
Ranked #184 on Image Classification on ImageNet
First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines.
We also demonstrate their usefulness in making design choices such as the number of classifiers in the ensemble and the size of a subset of data used for training that is needed to achieve a certain value of generalization error.
In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse.
We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset.
Current solutions to learning from geo-distributed data sources revolve around the idea of first centralizing the data in one data center, and then training locally.
In this paper we design a distributed algorithm for $l_1$ regularization that is much better suited for such systems than existing algorithms.
In this paper we give a novel approach to the distributed training of linear classifiers (involving smooth losses and L2 regularization) that is designed to reduce the total communication costs.