1 code implementation • 5 Jan 2023 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari
Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.
no code implementations • 16 Dec 2022 • Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis
We show that with minimal and class-agnostic prompt engineering those ImageNet clones we denote as ImageNet-SD are able to close a large part of the gap between models produced by synthetic images and models trained with real images for the several standard classification benchmarks that we consider in this study.
1 code implementation • 9 Dec 2022 • Zhiqi Kang, Enrico Fini, Moin Nabi, Elisa Ricci, Karteek Alahari
Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data.
no code implementations • 1 Dec 2022 • Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari
We propose a novel antialiasing method to increase shift invariance in convolutional neural networks (CNNs).
no code implementations • 29 Nov 2022 • Heeseung Kwon, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Karteek Alahari
Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators.
no code implementations • 21 Oct 2022 • Jules BOURCIER, Thomas Floquet, Gohar Dashyan, Tugdual Ceillier, Karteek Alahari, Jocelyn Chanussot
In defense-related remote sensing applications, such as vehicle detection on satellite imagery, supervised learning requires a huge number of labeled examples to reach operational performances.
no code implementations • 13 Oct 2022 • Jules BOURCIER, Gohar Dashyan, Jocelyn Chanussot, Karteek Alahari
The application of deep neural networks to remote sensing imagery is often constrained by the lack of ground-truth annotations.
no code implementations • 19 Sep 2022 • Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari
In this paper, we aim to improve the mathematical interpretability of convolutional neural networks for image classification.
no code implementations • 30 Jun 2022 • Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane Larlus
We consider the problem of training a deep neural network on a given classification task, e. g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks.
1 code implementation • 27 Jun 2022 • Florent Bartoccioni, Éloi Zablocki, Andrei Bursuc, Patrick Pérez, Matthieu Cord, Karteek Alahari
Recent works in autonomous driving have widely adopted the bird's-eye-view (BEV) semantic map as an intermediate representation of the world.
no code implementations • 23 Jun 2022 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari
Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.
no code implementations • 15 Jun 2022 • Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 28 Feb 2022 • Pia Bideau, Erik Learned-Miller, Cordelia Schmid, Karteek Alahari
In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly.
1 code implementation • CVPR 2022 • Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal
Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.
no code implementations • 1 Nov 2021 • Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
Our proposal is to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech.
1 code implementation • NeurIPS 2021 • Đ. Khuê Lê-Huu, Karteek Alahari
We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs).
Ranked #8 on
Semantic Segmentation
on Cityscapes test
1 code implementation • 8 Sep 2021 • Florent Bartoccioni, Éloi Zablocki, Patrick Pérez, Matthieu Cord, Karteek Alahari
In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e. g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems.
1 code implementation • 13 Jan 2021 • Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari
In this work, we present a memory-augmented approach for image-goal navigation.
no code implementations • ICPR 2021 • Avijit Dasgupta, C. V. Jawahar, Karteek Alahari
Existing approaches decompose this task into feature learning and relational reasoning.
no code implementations • 1 Jan 2021 • Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari
In this paper, we target an important issue of deep convolutional neural networks (CNNs) — the lack of a mathematical understanding of their properties.
1 code implementation • ICCV 2021 • Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek Alahari
In this paper, we argue that the semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG, a novel benchmark on the ImageNet-21K (IN-21K) dataset that enables measuring concept generalization in a principled way.
1 code implementation • ICML 2020 • Ekaterina Iakovleva, Jakob Verbeek, Karteek Alahari
We propose a novel amortized variational inference scheme for an empirical Bayes meta-learning model, where model parameters are treated as latent variables.
1 code implementation • 3 Aug 2020 • Samuel Albanie, Yang Liu, Arsha Nagrani, Antoine Miech, Ernesto Coto, Ivan Laptev, Rahul Sukthankar, Bernard Ghanem, Andrew Zisserman, Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid, Shi-Zhe Chen, Yida Zhao, Qin Jin, Kaixu Cui, Hui Liu, Chen Wang, Yudong Jiang, Xiaoshuai Hao
This report summarizes the results of the first edition of the challenge together with the findings of the participants.
1 code implementation • ECCV 2020 • Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid
In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others.
Ranked #14 on
Video Retrieval
on ActivityNet
(using extra training data)
no code implementations • 12 Mar 2020 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Karteek Alahari
Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information.
no code implementations • 25 Sep 2019 • Yu-Xiong Wang, Yuki Uchiyama, Martial Hebert, Karteek Alahari
Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks, which aim to learn novel concepts from very few examples.
no code implementations • NeurIPS 2019 • Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek
We show that our model significantly improves over existing hybrid models: offering GAN-like samples, IS and FID scores that are competitive with fully adversarial models, and improved likelihood scores.
no code implementations • 27 Sep 2018 • Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek
First, we propose a model that extends variational autoencoders by using deterministic invertible transformation layers to map samples from the decoder to the image space.
no code implementations • ECCV 2018 • Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari
Generative adversarial networks (GANs) are one of the most popular methods for generating images today.
5 code implementations • ECCV 2018 • Francisco M. Castro, Manuel J. Marín-Jiménez, Nicolás Guil, Cordelia Schmid, Karteek Alahari
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally.
Ranked #2 on
Incremental Learning
on ImageNet - 10 steps
(# M Params metric)
1 code implementation • CVPR 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).
no code implementations • 25 Apr 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari
In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.
no code implementations • 1 Dec 2017 • Pavel Tokmakov, Cordelia Schmid, Karteek Alahari
We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation.
3 code implementations • ICCV 2017 • Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari
Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i. e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data.
no code implementations • 19 Jul 2017 • Nicolas Chesneau, Grégory Rogez, Karteek Alahari, Cordelia Schmid
In this paper, we propose a new framework for action localization that tracks people in videos and extracts full-body human tubes, i. e., spatio-temporal regions localizing actions, even in the case of occlusions or truncations.
no code implementations • ICCV 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
The module to build a "visual memory" in video, i. e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences.
no code implementations • CVPR 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved.
no code implementations • 23 Mar 2016 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
We also demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images.
Image Segmentation
Weakly supervised Semantic Segmentation
+1
no code implementations • 13 Jan 2016 • Anand Mishra, Karteek Alahari, C. V. Jawahar
We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them.
no code implementations • ICCV 2015 • Yang Hua, Karteek Alahari, Cordelia Schmid
Tracking-by-detection approaches are some of the most successful object trackers in recent years.
no code implementations • CVPR 2014 • Anoop Cherian, Julien Mairal, Karteek Alahari, Cordelia Schmid
In this paper, we present a method for estimating articulated human poses in videos.
no code implementations • CVPR 2013 • Florent Couzinie-Devy, Jian Sun, Karteek Alahari, Jean Ponce
This paper addresses the problem of restoring images subjected to unknown and spatially varying blur caused by defocus or linear (say, horizontal) motion.