Search Results for author: Karteek Alahari

Found 39 papers, 12 papers with code

From CNNs to Shift-Invariant Twin Wavelet Models

no code implementations1 Dec 2022 Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

We propose a novel antialiasing method to increase shift invariance in convolutional neural networks (CNNs).

Lightweight Structure-Aware Attention for Visual Understanding

no code implementations29 Nov 2022 Heeseung Kwon, Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Karteek Alahari

Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators.

Representation Learning

Self-Supervised Pretraining on Satellite Imagery: a Case Study on Label-Efficient Vehicle Detection

no code implementations21 Oct 2022 Jules BOURCIER, Thomas Floquet, Gohar Dashyan, Tugdual Ceillier, Karteek Alahari, Jocelyn Chanussot

In defense-related remote sensing applications, such as vehicle detection on satellite imagery, supervised learning requires a huge number of labeled examples to reach operational performances.

object-detection Object Detection +2

On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks

no code implementations19 Sep 2022 Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

In this paper, we aim to improve the mathematical interpretability of convolutional neural networks for image classification.

Image Classification

Improving the Generalization of Supervised Models

no code implementations30 Jun 2022 Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane Larlus

Models trained with self-supervised learning (SSL) tend to generalize better than their supervised counterparts for transfer learning; yet, they still lag behind supervised models on IN1K.

Data Augmentation Self-Supervised Learning +1

Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

no code implementations23 Jun 2022 Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari

Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.

Continuous Control

AVATAR: Unconstrained Audiovisual Speech Recognition

no code implementations15 Jun 2022 Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth.

Automatic Speech Recognition speech-recognition

The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields

no code implementations28 Feb 2022 Pia Bideau, Erik Learned-Miller, Cordelia Schmid, Karteek Alahari

In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly.

Motion Segmentation

Self-Supervised Models are Continual Learners

1 code implementation CVPR 2022 Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal

Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.

Continual Learning Representation Learning

Masking Modalities for Cross-modal Video Retrieval

no code implementations1 Nov 2021 Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Our proposal is to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech.

Retrieval Video Retrieval

Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

1 code implementation NeurIPS 2021 Đ. Khuê Lê-Huu, Karteek Alahari

We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs).

Ranked #7 on Semantic Segmentation on Cityscapes test (using extra training data)

Semantic Segmentation

LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

1 code implementation8 Sep 2021 Florent Bartoccioni, Éloi Zablocki, Patrick Pérez, Matthieu Cord, Karteek Alahari

In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e. g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems.

Depth Completion Depth Estimation

Dual-Tree Wavelet Packet CNNs for Image Classification

no code implementations1 Jan 2021 Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari

In this paper, we target an important issue of deep convolutional neural networks (CNNs) — the lack of a mathematical understanding of their properties.

Classification General Classification +1

Concept Generalization in Visual Representation Learning

1 code implementation ICCV 2021 Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek Alahari

In this paper, we argue that the semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG, a novel benchmark on the ImageNet-21K (IN-21K) dataset that enables measuring concept generalization in a principled way.

Representation Learning Self-Supervised Learning

Meta-Learning with Shared Amortized Variational Inference

1 code implementation ICML 2020 Ekaterina Iakovleva, Jakob Verbeek, Karteek Alahari

We propose a novel amortized variational inference scheme for an empirical Bayes meta-learning model, where model parameters are treated as latent variables.

Meta-Learning Variational Inference

Multi-modal Transformer for Video Retrieval

1 code implementation ECCV 2020 Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid

In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others.

Ranked #11 on Video Retrieval on ActivityNet (using extra training data)

Natural Language Queries Retrieval +1

Beyond the Camera: Neural Networks in World Coordinates

no code implementations12 Mar 2020 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Karteek Alahari

Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information.

Action Recognition Video Stabilization +1

Meta-Learning by Hallucinating Useful Examples

no code implementations25 Sep 2019 Yu-Xiong Wang, Yuki Uchiyama, Martial Hebert, Karteek Alahari

Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks, which aim to learn novel concepts from very few examples.

Few-Shot Learning Novel Concepts

Adaptive Density Estimation for Generative Models

no code implementations NeurIPS 2019 Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek

We show that our model significantly improves over existing hybrid models: offering GAN-like samples, IS and FID scores that are competitive with fully adversarial models, and improved likelihood scores.

Density Estimation

Coverage and Quality Driven Training of Generative Image Models

no code implementations27 Sep 2018 Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek

First, we propose a model that extends variational autoencoders by using deterministic invertible transformation layers to map samples from the decoder to the image space.

How good is my GAN?

no code implementations ECCV 2018 Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari

Generative adversarial networks (GANs) are one of the most popular methods for generating images today.

General Classification Image Classification

End-to-End Incremental Learning

5 code implementations ECCV 2018 Francisco M. Castro, Manuel J. Marín-Jiménez, Nicolás Guil, Cordelia Schmid, Karteek Alahari

Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally.

Ranked #2 on Incremental Learning on ImageNet - 10 steps (# M Params metric)

Image Classification Incremental Learning

Actor and Observer: Joint Modeling of First and Third-Person Videos

1 code implementation CVPR 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).

Action Recognition

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

no code implementations25 Apr 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.

General Classification Video Classification +1

Learning to Segment Moving Objects

no code implementations1 Dec 2017 Pavel Tokmakov, Cordelia Schmid, Karteek Alahari

We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation.

Motion Estimation Motion Segmentation +3

Incremental Learning of Object Detectors without Catastrophic Forgetting

3 code implementations ICCV 2017 Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari

Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i. e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data.

Incremental Learning object-detection +1

Detecting Parts for Action Localization

no code implementations19 Jul 2017 Nicolas Chesneau, Grégory Rogez, Karteek Alahari, Cordelia Schmid

In this paper, we propose a new framework for action localization that tracks people in videos and extracts full-body human tubes, i. e., spatio-temporal regions localizing actions, even in the case of occlusions or truncations.

Action Localization

Learning Video Object Segmentation with Visual Memory

no code implementations ICCV 2017 Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

The module to build a "visual memory" in video, i. e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences.

Motion Segmentation Semantic Segmentation +2

Learning Motion Patterns in Videos

no code implementations CVPR 2017 Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved.

Ranked #21 on Unsupervised Video Object Segmentation on DAVIS 2016 (using extra training data)

Motion Segmentation Optical Flow Estimation +2

Weakly-Supervised Semantic Segmentation using Motion Cues

no code implementations23 Mar 2016 Pavel Tokmakov, Karteek Alahari, Cordelia Schmid

We also demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images.

Image Segmentation Weakly supervised Semantic Segmentation +1

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

no code implementations13 Jan 2016 Anand Mishra, Karteek Alahari, C. V. Jawahar

We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them.

Scene Text Recognition

Online Object Tracking with Proposal Selection

no code implementations ICCV 2015 Yang Hua, Karteek Alahari, Cordelia Schmid

Tracking-by-detection approaches are some of the most successful object trackers in recent years.

Visual Object Tracking

Learning to Estimate and Remove Non-uniform Image Blur

no code implementations CVPR 2013 Florent Couzinie-Devy, Jian Sun, Karteek Alahari, Jean Ponce

This paper addresses the problem of restoring images subjected to unknown and spatially varying blur caused by defocus or linear (say, horizontal) motion.

Deblurring

Cannot find the paper you are looking for? You can Submit a new open access paper.