Search Results for author: Karteek Alahari

Found 50 papers, 20 papers with code

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

no code implementations20 Dec 2023 Tariq Berrada, Jakob Verbeek, Camille Couprie, Karteek Alahari

Semantic image synthesis, i. e., generating images from user-provided semantic label maps, is an important conditional image generation task as it allows to control both the content as well as the spatial layout of generated images.

Conditional Image Generation Image Classification +1

Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation

no code implementations30 Nov 2023 Avijit Dasgupta, C. V. Jawahar, Karteek Alahari

We use the source pre-trained model to generate pseudo-labels for the target domain samples, which are inevitably noisy.

Domain Adaptation

On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers

1 code implementation18 Aug 2023 Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, Elisa Ricci

State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting.

Continual Learning Transfer Learning

Guided Distillation for Semi-Supervised Instance Segmentation

1 code implementation3 Aug 2023 Tariq Berrada, Camille Couprie, Karteek Alahari, Jakob Verbeek

Although instance segmentation methods have improved considerably, the dominant paradigm is to rely on fully-annotated training images, which are tedious to obtain.

Instance Segmentation Semantic Segmentation +1

Multi-Domain Learning with Modulation Adapters

no code implementations17 Jul 2023 Ekaterina Iakovleva, Karteek Alahari, Jakob Verbeek

Deep convolutional networks are ubiquitous in computer vision, due to their excellent performance across different tasks for various domains.

Image Classification

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

no code implementations18 Apr 2023 Lina Mezghani, Piotr Bojanowski, Karteek Alahari, Sainbayar Sukhbaatar

The success of transformer models trained with a language modeling objective brings a promising opportunity to the reinforcement learning framework.

Language Modelling

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

1 code implementation5 Jan 2023 Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.

Continuous Control Self-Supervised Learning

Fake it till you make it: Learning transferable representations from synthetic ImageNet clones

no code implementations CVPR 2023 Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis

We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study.

Classification Image Generation +1

A soft nearest-neighbor framework for continual semi-supervised learning

1 code implementation ICCV 2023 Zhiqi Kang, Enrico Fini, Moin Nabi, Elisa Ricci, Karteek Alahari

Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data.

Continual Learning

From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets

no code implementations1 Dec 2022 Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance.

Lightweight Structure-Aware Attention for Visual Understanding

no code implementations29 Nov 2022 Heeseung Kwon, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Karteek Alahari

Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators.

Representation Learning

Self-Supervised Pretraining on Satellite Imagery: a Case Study on Label-Efficient Vehicle Detection

no code implementations21 Oct 2022 Jules BOURCIER, Thomas Floquet, Gohar Dashyan, Tugdual Ceillier, Karteek Alahari, Jocelyn Chanussot

In defense-related remote sensing applications, such as vehicle detection on satellite imagery, supervised learning requires a huge number of labeled examples to reach operational performances.

object-detection Object Detection +2

On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks

no code implementations19 Sep 2022 Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification.

Image Classification

No Reason for No Supervision: Improved Generalization in Supervised Models

1 code implementation30 Jun 2022 Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane Larlus

We consider the problem of training a deep neural network on a given classification task, e. g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks.

Data Augmentation Self-Supervised Learning +1

Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

no code implementations23 Jun 2022 Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari

Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.

Continuous Control

AVATAR: Unconstrained Audiovisual Speech Recognition

1 code implementation15 Jun 2022 Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields

no code implementations28 Feb 2022 Pia Bideau, Erik Learned-Miller, Cordelia Schmid, Karteek Alahari

In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly.

Motion Segmentation

Self-Supervised Models are Continual Learners

1 code implementation CVPR 2022 Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal

Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.

Continual Learning Representation Learning

Masking Modalities for Cross-modal Video Retrieval

no code implementations1 Nov 2021 Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Our proposal is to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech.

Retrieval Video Retrieval

Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

1 code implementation NeurIPS 2021 Đ. Khuê Lê-Huu, Karteek Alahari

We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs).

Ranked #13 on Semantic Segmentation on Cityscapes test (using extra training data)

Semantic Segmentation

LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

1 code implementation8 Sep 2021 Florent Bartoccioni, Éloi Zablocki, Patrick Pérez, Matthieu Cord, Karteek Alahari

In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e. g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems.

Depth Completion Depth Estimation

Dual-Tree Wavelet Packet CNNs for Image Classification

no code implementations1 Jan 2021 Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

In this paper, we target an important issue of deep convolutional neural networks (CNNs) — the lack of a mathematical understanding of their properties.

Classification General Classification +1

Concept Generalization in Visual Representation Learning

1 code implementation ICCV 2021 Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek Alahari

In this paper, we argue that the semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG, a novel benchmark on the ImageNet-21K (IN-21K) dataset that enables measuring concept generalization in a principled way.

Representation Learning Self-Supervised Learning

Meta-Learning with Shared Amortized Variational Inference

1 code implementation ICML 2020 Ekaterina Iakovleva, Jakob Verbeek, Karteek Alahari

We propose a novel amortized variational inference scheme for an empirical Bayes meta-learning model, where model parameters are treated as latent variables.

Meta-Learning Variational Inference

Multi-modal Transformer for Video Retrieval

1 code implementation ECCV 2020 Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid

In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others.

 Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT (text-to-video Mean Rank metric, using extra training data)

Natural Language Queries Retrieval +2

Beyond the Camera: Neural Networks in World Coordinates

no code implementations12 Mar 2020 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Karteek Alahari

Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information.

Action Recognition Video Stabilization +1

Meta-Learning by Hallucinating Useful Examples

no code implementations25 Sep 2019 Yu-Xiong Wang, Yuki Uchiyama, Martial Hebert, Karteek Alahari

Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks, which aim to learn novel concepts from very few examples.

Few-Shot Learning Hallucination +1

Adaptive Density Estimation for Generative Models

no code implementations NeurIPS 2019 Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek

We show that our model significantly improves over existing hybrid models: offering GAN-like samples, IS and FID scores that are competitive with fully adversarial models, and improved likelihood scores.

Density Estimation

Coverage and Quality Driven Training of Generative Image Models

no code implementations27 Sep 2018 Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek

First, we propose a model that extends variational autoencoders by using deterministic invertible transformation layers to map samples from the decoder to the image space.

How good is my GAN?

no code implementations ECCV 2018 Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari

Generative adversarial networks (GANs) are one of the most popular methods for generating images today.

General Classification Image Classification +1

End-to-End Incremental Learning

5 code implementations ECCV 2018 Francisco M. Castro, Manuel J. Marín-Jiménez, Nicolás Guil, Cordelia Schmid, Karteek Alahari

Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally.

Image Classification Incremental Learning

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

no code implementations25 Apr 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.

General Classification Video Classification +1

Actor and Observer: Joint Modeling of First and Third-Person Videos

1 code implementation CVPR 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).

Action Recognition Temporal Action Localization

Learning to Segment Moving Objects

no code implementations1 Dec 2017 Pavel Tokmakov, Cordelia Schmid, Karteek Alahari

We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation.

Motion Estimation Motion Segmentation +5

Incremental Learning of Object Detectors without Catastrophic Forgetting

3 code implementations ICCV 2017 Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari

Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i. e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data.

Incremental Learning Object +2

Detecting Parts for Action Localization

no code implementations19 Jul 2017 Nicolas Chesneau, Grégory Rogez, Karteek Alahari, Cordelia Schmid

In this paper, we propose a new framework for action localization that tracks people in videos and extracts full-body human tubes, i. e., spatio-temporal regions localizing actions, even in the case of occlusions or truncations.

Action Localization

Learning Video Object Segmentation with Visual Memory

no code implementations ICCV 2017 Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

The module to build a "visual memory" in video, i. e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences.

Motion Segmentation Object +3

Learning Motion Patterns in Videos

no code implementations CVPR 2017 Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved.

Motion Segmentation Optical Flow Estimation +3

Weakly-Supervised Semantic Segmentation using Motion Cues

no code implementations23 Mar 2016 Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

We also demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images.

Image Segmentation Weakly supervised Semantic Segmentation +1

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

no code implementations13 Jan 2016 Anand Mishra, Karteek Alahari, C. V. Jawahar

We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them.

Scene Text Recognition

Learning to Estimate and Remove Non-uniform Image Blur

no code implementations CVPR 2013 Florent Couzinie-Devy, Jian Sun, Karteek Alahari, Jean Ponce

This paper addresses the problem of restoring images subjected to unknown and spatially varying blur caused by defocus or linear (say, horizontal) motion.

Deblurring

Cannot find the paper you are looking for? You can Submit a new open access paper.