Search Results for author: Karteek Alahari

Found 50 papers, 20 papers with code

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

no code implementations • 20 Dec 2023 • Tariq Berrada, Jakob Verbeek, Camille Couprie, Karteek Alahari

Semantic image synthesis, i. e., generating images from user-provided semantic label maps, is an important conditional image generation task as it allows to control both the content as well as the spatial layout of generated images.

Ranked #1 on Image-to-Image Translation on ADE20K Labels-to-Photos

Conditional Image Generation Image Classification +1

Paper
Add Code

Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation

no code implementations • 30 Nov 2023 • Avijit Dasgupta, C. V. Jawahar, Karteek Alahari

We use the source pre-trained model to generate pseudo-labels for the target domain samples, which are inevitably noisy.

Domain Adaptation

Paper
Add Code

On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers

1 code implementation • 18 Aug 2023 • Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, Elisa Ricci

State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting.

Continual Learning Transfer Learning

Paper
Code

Guided Distillation for Semi-Supervised Instance Segmentation

1 code implementation • 3 Aug 2023 • Tariq Berrada, Camille Couprie, Karteek Alahari, Jakob Verbeek

Although instance segmentation methods have improved considerably, the dominant paradigm is to rely on fully-annotated training images, which are tedious to obtain.

Ranked #1 on Semi-Supervised Instance Segmentation on COCO 10% labeled data

Instance Segmentation Semantic Segmentation +1

Paper
Code

Multi-Domain Learning with Modulation Adapters

no code implementations • 17 Jul 2023 • Ekaterina Iakovleva, Karteek Alahari, Jakob Verbeek

Deep convolutional networks are ubiquitous in computer vision, due to their excellent performance across different tasks for various domains.

Image Classification

Paper
Add Code

Semi-supervised learning made simple with self-supervised clustering

1 code implementation • CVPR 2023 • Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, Elisa Ricci

Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations.

Clustering Self-Supervised Learning

Paper
Code

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

no code implementations • 18 Apr 2023 • Lina Mezghani, Piotr Bojanowski, Karteek Alahari, Sainbayar Sukhbaatar

The success of transformer models trained with a language modeling objective brings a promising opportunity to the reinforcement learning framework.

Language Modelling

Paper
Add Code

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

1 code implementation • 5 Jan 2023 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.

Continuous Control Self-Supervised Learning

Paper
Code

Fake it till you make it: Learning transferable representations from synthetic ImageNet clones

no code implementations • CVPR 2023 • Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis

We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study.

Classification Image Generation +1

Paper
Add Code

A soft nearest-neighbor framework for continual semi-supervised learning

1 code implementation • ICCV 2023 • Zhiqi Kang, Enrico Fini, Moin Nabi, Elisa Ricci, Karteek Alahari

Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data.

Continual Learning

Paper
Code

From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets

no code implementations • 1 Dec 2022 • Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance.

Paper
Add Code

Lightweight Structure-Aware Attention for Visual Understanding

no code implementations • 29 Nov 2022 • Heeseung Kwon, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Karteek Alahari

Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators.

Representation Learning

Paper
Add Code

Self-Supervised Pretraining on Satellite Imagery: a Case Study on Label-Efficient Vehicle Detection

no code implementations • 21 Oct 2022 • Jules BOURCIER, Thomas Floquet, Gohar Dashyan, Tugdual Ceillier, Karteek Alahari, Jocelyn Chanussot

In defense-related remote sensing applications, such as vehicle detection on satellite imagery, supervised learning requires a huge number of labeled examples to reach operational performances.

object-detection Object Detection +2

Paper
Add Code

Evaluating the Label Efficiency of Contrastive Self-Supervised Learning for Multi-Resolution Satellite Imagery

no code implementations • 13 Oct 2022 • Jules BOURCIER, Gohar Dashyan, Jocelyn Chanussot, Karteek Alahari

The application of deep neural networks to remote sensing imagery is often constrained by the lack of ground-truth annotations.

Earth Observation Representation Learning +1

Paper
Add Code

On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks

no code implementations • 19 Sep 2022 • Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification.

Image Classification

Paper
Add Code

No Reason for No Supervision: Improved Generalization in Supervised Models

1 code implementation • 30 Jun 2022 • Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane Larlus

We consider the problem of training a deep neural network on a given classification task, e. g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks.

Data Augmentation Self-Supervised Learning +1

Paper
Code

LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

1 code implementation • 27 Jun 2022 • Florent Bartoccioni, Éloi Zablocki, Andrei Bursuc, Patrick Pérez, Matthieu Cord, Karteek Alahari

Recent works in autonomous driving have widely adopted the bird's-eye-view (BEV) semantic map as an intermediate representation of the world.

Ranked #6 on Bird's-Eye View Semantic Segmentation on nuScenes

Autonomous Driving Bird's-Eye View Semantic Segmentation +1

Paper
Code

Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

no code implementations • 23 Jun 2022 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari

Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.

Continuous Control

Paper
Add Code

AVATAR: Unconstrained Audiovisual Speech Recognition

1 code implementation • 15 Jun 2022 • Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

119

Paper
Code

The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields

no code implementations • 28 Feb 2022 • Pia Bideau, Erik Learned-Miller, Cordelia Schmid, Karteek Alahari

In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly.

Motion Segmentation

Paper
Add Code

Self-Supervised Models are Continual Learners

1 code implementation • CVPR 2022 • Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal

Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.

Continual Learning Representation Learning

110

Paper
Code

Masking Modalities for Cross-modal Video Retrieval

no code implementations • 1 Nov 2021 • Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Our proposal is to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech.

Retrieval Video Retrieval

Paper
Add Code

Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

1 code implementation • NeurIPS 2021 • Đ. Khuê Lê-Huu, Karteek Alahari

We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs).

Ranked #13 on Semantic Segmentation on Cityscapes test (using extra training data)

Semantic Segmentation

Paper
Code

LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

1 code implementation • 8 Sep 2021 • Florent Bartoccioni, Éloi Zablocki, Patrick Pérez, Matthieu Cord, Karteek Alahari

In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e. g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems.

Depth Completion Depth Estimation

Paper
Code

Memory-Augmented Reinforcement Learning for Image-Goal Navigation

1 code implementation • 13 Jan 2021 • Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari

In this work, we present a memory-augmented approach for image-goal navigation.

Data Augmentation Navigate +2

Paper
Code

Dual-Tree Wavelet Packet CNNs for Image Classification

no code implementations • 1 Jan 2021 • Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

In this paper, we target an important issue of deep convolutional neural networks (CNNs) — the lack of a mathematical understanding of their properties.

Classification General Classification +1

Paper
Add Code

Context Aware Group Activity Recognition

no code implementations • ICPR 2021 • Avijit Dasgupta, C. V. Jawahar, Karteek Alahari

Existing approaches decompose this task into feature learning and relational reasoning.

Group Activity Recognition Relational Reasoning

Paper
Add Code

Concept Generalization in Visual Representation Learning

1 code implementation • ICCV 2021 • Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek Alahari

In this paper, we argue that the semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG, a novel benchmark on the ImageNet-21K (IN-21K) dataset that enables measuring concept generalization in a principled way.

Representation Learning Self-Supervised Learning

Paper
Code

Meta-Learning with Shared Amortized Variational Inference

1 code implementation • ICML 2020 • Ekaterina Iakovleva, Jakob Verbeek, Karteek Alahari

We propose a novel amortized variational inference scheme for an empirical Bayes meta-learning model, where model parameters are treated as latent variables.

Meta-Learning Variational Inference

Paper
Code

The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)

1 code implementation • 3 Aug 2020 • Samuel Albanie, Yang Liu, Arsha Nagrani, Antoine Miech, Ernesto Coto, Ivan Laptev, Rahul Sukthankar, Bernard Ghanem, Andrew Zisserman, Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid, Shi-Zhe Chen, Yida Zhao, Qin Jin, Kaixu Cui, Hui Liu, Chen Wang, Yudong Jiang, Xiaoshuai Hao

This report summarizes the results of the first edition of the challenge together with the findings of the participants.

Natural Language Queries Retrieval +3

327

Paper
Code

Multi-modal Transformer for Video Retrieval

1 code implementation • ECCV 2020 • Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid

In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others.

Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT (text-to-video Mean Rank metric, using extra training data)

Natural Language Queries Retrieval +2

249

Paper
Code

Beyond the Camera: Neural Networks in World Coordinates

no code implementations • 12 Mar 2020 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Karteek Alahari

Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information.

Action Recognition Video Stabilization +1

Paper
Add Code

Meta-Learning by Hallucinating Useful Examples

no code implementations • 25 Sep 2019 • Yu-Xiong Wang, Yuki Uchiyama, Martial Hebert, Karteek Alahari

Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks, which aim to learn novel concepts from very few examples.

Few-Shot Learning Hallucination +1

Paper
Add Code

Adaptive Density Estimation for Generative Models

no code implementations • NeurIPS 2019 • Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek

We show that our model significantly improves over existing hybrid models: offering GAN-like samples, IS and FID scores that are competitive with fully adversarial models, and improved likelihood scores.

Density Estimation

Paper
Add Code

Coverage and Quality Driven Training of Generative Image Models

no code implementations • 27 Sep 2018 • Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek

First, we propose a model that extends variational autoencoders by using deterministic invertible transformation layers to map samples from the decoder to the image space.

Paper
Add Code

End-to-End Incremental Learning

5 code implementations • ECCV 2018 • Francisco M. Castro, Manuel J. Marín-Jiménez, Nicolás Guil, Cordelia Schmid, Karteek Alahari

Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally.

Ranked #2 on Incremental Learning on ImageNet100 - 10 steps (# M Params metric)

Image Classification Incremental Learning

494

Paper
Code

How good is my GAN?

no code implementations • ECCV 2018 • Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari

Generative adversarial networks (GANs) are one of the most popular methods for generating images today.

General Classification Image Classification

Paper
Add Code

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

no code implementations • 25 Apr 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.

General Classification Video Classification +1

Paper
Add Code

Actor and Observer: Joint Modeling of First and Third-Person Videos

1 code implementation • CVPR 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).

Action Recognition Temporal Action Localization

Paper
Code

Learning to Segment Moving Objects

no code implementations • 1 Dec 2017 • Pavel Tokmakov, Cordelia Schmid, Karteek Alahari

We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation.

Motion Estimation Motion Segmentation +4

Paper
Add Code

Incremental Learning of Object Detectors without Catastrophic Forgetting

3 code implementations • ICCV 2017 • Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari

Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i. e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data.

Incremental Learning Object +2

127

Paper
Code

Detecting Parts for Action Localization

no code implementations • 19 Jul 2017 • Nicolas Chesneau, Grégory Rogez, Karteek Alahari, Cordelia Schmid

In this paper, we propose a new framework for action localization that tracks people in videos and extracts full-body human tubes, i. e., spatio-temporal regions localizing actions, even in the case of occlusions or truncations.

Action Localization

Paper
Add Code

Learning Video Object Segmentation with Visual Memory

no code implementations • ICCV 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

The module to build a "visual memory" in video, i. e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences.

Ranked #3 on Unsupervised Video Object Segmentation on SegTrack v2

Motion Segmentation Object +3

Paper
Add Code

Learning Motion Patterns in Videos

no code implementations • CVPR 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved.

Motion Segmentation Optical Flow Estimation +3

Paper
Add Code

Weakly-Supervised Semantic Segmentation using Motion Cues

no code implementations • 23 Mar 2016 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

We also demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images.

Image Segmentation Weakly supervised Semantic Segmentation +1

Paper
Add Code

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

no code implementations • 13 Jan 2016 • Anand Mishra, Karteek Alahari, C. V. Jawahar

We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them.

Scene Text Recognition

Paper
Add Code

Online Object Tracking with Proposal Selection

no code implementations • ICCV 2015 • Yang Hua, Karteek Alahari, Cordelia Schmid

Tracking-by-detection approaches are some of the most successful object trackers in recent years.

Object Visual Object Tracking

Paper
Add Code

Mixing Body-Part Sequences for Human Pose Estimation

no code implementations • CVPR 2014 • Anoop Cherian, Julien Mairal, Karteek Alahari, Cordelia Schmid

In this paper, we present a method for estimating articulated human poses in videos.

Pose Estimation

Paper
Add Code

Learning to Estimate and Remove Non-uniform Image Blur

no code implementations • CVPR 2013 • Florent Couzinie-Devy, Jian Sun, Karteek Alahari, Jean Ponce

This paper addresses the problem of restoring images subjected to unknown and spatially varying blur caused by defocus or linear (say, horizontal) motion.

Deblurring

Paper
Add Code

Scene Text Recognition using Higher Order Language Priors

1 code implementation • BMVC 2012 - Electronic Proceedings of the British Machine Vision Conference 2012 2012 • Anand Mishra, Karteek Alahari, C. V. Jawahar

The problem of recognizing text in images taken in the wild has gained significant attention from the computer vision community in recent years.

Scene Text Recognition

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.