Search Results for author: Anoop Cherian

Found 61 papers, 10 papers with code

Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection

no code implementations17 Dec 2023 Xinghao Zhu, Devesh K. Jha, Diego Romeres, Lingfeng Sun, Masayoshi Tomizuka, Anoop Cherian

Automating the assembly of objects from their parts is a complex problem with innumerable applications in manufacturing, maintenance, and recycling.

Motion Planning valid

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

no code implementations ICCV 2023 Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks

To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation.

Colorization Conditional Image Generation +2

Pixel-Grounded Prototypical Part Networks

no code implementations25 Sep 2023 Zachariah Carmichael, Suhas Lohit, Anoop Cherian, Michael Jones, Walter Scheirer

Prototypical part neural networks (ProtoPartNNs), namely PROTOPNET and its derivatives, are an intrinsically interpretable approach to machine learning.

Object

CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments

no code implementations6 Jun 2023 Xiulong Liu, Sudipta Paul, Moitreya Chatterjee, Anoop Cherian

Audio-visual navigation of an agent towards locating an audio goal is a challenging task especially when the audio is sporadic or the environment is noisy.

Hierarchical Reinforcement Learning Navigate +5

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

1 code implementation CVPR 2023 Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen Gould

In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world.

Contrastive Learning Image Retrieval +2

Are Deep Neural Networks SMARTer than Second Graders?

1 code implementation CVPR 2023 Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B. Tenenbaum

To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6--8 age group.

Language Modelling Meta-Learning +1

Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation

no code implementations29 Oct 2022 Moitreya Chatterjee, Narendra Ahuja, Anoop Cherian

In this paper, we propose to use this connection between audio and visual dynamics for solving two challenging tasks simultaneously, namely: (i) separating audio sources from a mixture using visual cues, and (ii) predicting the 3D visual motion of a sounding source using its separated audio.

Audio Source Separation

H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions

no code implementations22 Oct 2022 Kei Ota, Hsiao-Yu Tung, Kevin A. Smith, Anoop Cherian, Tim K. Marks, Alan Sullivan, Asako Kanezaki, Joshua B. Tenenbaum

The world is filled with articulated objects that are difficult to determine how to use from vision alone, e. g., a door might open inwards or outwards.

AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments

no code implementations14 Oct 2022 Sudipta Paul, Amit K. Roy-Chowdhury, Anoop Cherian

Similar to audio-visual navigation tasks, the goal of our embodied agent is to localize an audio event via navigating the 3D visual world; however, the agent may also seek help from a human (oracle), where the assistance is provided in free-form natural language.

Hierarchical Reinforcement Learning Navigate +1

(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering

no code implementations18 Feb 2022 Anoop Cherian, Chiori Hori, Tim K. Marks, Jonathan Le Roux

Spatio-temporal scene-graph approaches to video-based reasoning tasks, such as video question-answering (QA), typically construct such graphs for every video frame.

Question Answering Spatio-temporal Scene Graphs +1

Max-Margin Contrastive Learning

1 code implementation21 Dec 2021 Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian

Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence.

Contrastive Learning Representation Learning +1

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

no code implementations13 Oct 2021 Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Region Proposal

Visual Scene Graphs for Audio Source Separation

no code implementations ICCV 2021 Moitreya Chatterjee, Jonathan Le Roux, Narendra Ahuja, Anoop Cherian

At its core, AVSGS uses a recursive neural network that emits mutually-orthogonal sub-graph embeddings of the visual graph using multi-head attention.

AudioCaps Audio Source Separation

InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images

no code implementations ICCV 2021 Anoop Cherian, Goncalo Dias Pais, Siddarth Jain, Tim K. Marks, Alan Sullivan

To use our model for instance segmentation, we propose an instance pose encoder that learns to take in a generated depth image and reproduce the pose code vectors for all of the object instances.

Generative Adversarial Network Instance Segmentation +2

Generalized One-Class Learning Using Pairs of Complementary Classifiers

no code implementations24 Jun 2021 Anoop Cherian, Jue Wang

One-class learning is the classic problem of fitting a model to the data for which annotations are available only for a single class.

Anomaly Detection

First-Order Optimization Algorithms via Discretization of Finite-Time Convergent Flows

no code implementations1 Jan 2021 Mouhacine Benosman, Orlando Romero, Anoop Cherian

In this paper, we investigate in the context of deep neural networks, the performance of several discretization algorithms for two first-order finite-time optimization flows.

Learning to Generate Videos Using Neural Uncertainty Priors

no code implementations1 Jan 2021 Moitreya Chatterjee, Anoop Cherian, Narendra Ahuja

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.

Video Generation

Tensor Representations for Action Recognition

1 code implementation28 Dec 2020 Piotr Koniusz, Lei Wang, Anoop Cherian

In this paper, we propose novel tensor representations for compactly capturing such higher-order relationships between visual features for the task of action recognition.

Action Recognition In Videos Skeleton Based Action Recognition

First-Order Optimization Inspired from Finite-Time Convergent Flows

no code implementations6 Oct 2020 Siqi Zhang, Mouhacine Benosman, Orlando Romero, Anoop Cherian

In this paper, we investigate the performance of two first-order optimization algorithms, obtained from forward Euler discretization of finite-time optimization flows.

Sound2Sight: Generating Visual Dynamics from Sound and Context

no code implementations ECCV 2020 Anoop Cherian, Moitreya Chatterjee, Narendra Ahuja

To tackle this problem, we present Sound2Sight, a deep variational framework, that is trained to learn a per frame stochastic prior conditioned on a joint embedding of audio and past frames.

Multimodal Reasoning

Representation Learning via Adversarially-Contrastive Optimal Transport

no code implementations ICML 2020 Anoop Cherian, Shuchin Aeron

To maximize extraction of such informative cues from the data, we set the problem within the context of contrastive representation learning and to that end propose a novel objective via optimal transport.

Action Recognition Contrastive Learning +3

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

no code implementations8 Jul 2020 Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.

Answer Generation Graph Representation Learning

Dense Non-Rigid Structure from Motion: A Manifold Viewpoint

no code implementations15 Jun 2020 Suryansh Kumar, Luc van Gool, Carlos E. P. de Oliveira, Anoop Cherian, Yuchao Dai, Hongdong Li

Assuming that a deforming shape is composed of a union of local linear subspace and, span a global low-rank space over multiple frames enables us to efficiently model complex non-rigid deformations.

Clustering

Spatio-Temporal Ranked-Attention Networks for Video Captioning

no code implementations17 Jan 2020 Anoop Cherian, Jue Wang, Chiori Hori, Tim K. Marks

To this end, we propose a Spatio-Temporal and Temporo-Spatial (STaTS) attention model which, conditioned on the language state, hierarchically combines spatial and temporal attention to videos in two different orders: (i) a spatio-temporal (ST) sub-model, which first attends to regions that have temporal evolution, then temporally pools the features from these regions; and (ii) a temporo-spatial (TS) sub-model, which first decides a single frame to attend to, then applies spatial attention within that frame.

Video Captioning

Discriminative Video Representation Learning Using Support Vector Classifiers

no code implementations5 Sep 2019 Jue Wang, Anoop Cherian

With the features from the video as a positive bag and the irrelevant features as the negative bag, we cast an objective to learn a (nonlinear) hyperplane that separates the unknown useful features from the rest in a multiple instance learning formulation within a support vector machine setup.

Action Recognition In Videos Multiple Instance Learning +1

GODS: Generalized One-class Discriminative Subspaces for Anomaly Detection

no code implementations ICCV 2019 Jue Wang, Anoop Cherian

One-class learning is the classic problem of fitting a model to data for which annotations are available only for a single class.

Anomaly Detection Novelty Detection +1

Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function

no code implementations15 May 2019 Arvind U. Raghunathan, Anoop Cherian, Devesh K. Jha

To this end, we introduce the Gradient-based Nikaido-Isoda (GNI) function which serves: (i) as a merit function, vanishing only at the first-order stationary points of each player's optimization problem, and (ii) provides error bounds to a stationary Nash point.

Learning Discriminative Video Representations Using Adversarial Perturbations

no code implementations ECCV 2018 Jue Wang, Anoop Cherian

As the perturbed features belong to data classes that are likely to be confused with the original features, the discriminative subspace will characterize parts of the feature space that are more representative of the original data, and thus may provide robust video representations.

Binary Classification Riemannian optimization +1

Contrastive Video Representation Learning via Adversarial Perturbations

no code implementations ECCV 2018 Jue Wang, Anoop Cherian

In this paper, we propose to use such perturbations within a novel contrastive learning setup to build negative samples, which are then used to produce improved video representations.

Action Recognition Binary Classification +4

Sem-GAN: Semantically-Consistent Image-to-Image Translation

1 code implementation12 Jul 2018 Anoop Cherian, Alan Sullivan

To this end, we present a semantically-consistent GAN framework, dubbed Sem-GAN, in which the semantics are defined by the class identities of image segments in the source domain as produced by a semantic segmentation algorithm.

Image Segmentation Image-to-Image Translation +3

Non-Linear Temporal Subspace Representations for Activity Recognition

no code implementations CVPR 2018 Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert space, projections of data onto which captures their temporal order.

Action Recognition Riemannian optimization +3

Video Representation Learning Using Discriminative Pooling

no code implementations CVPR 2018 Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould

In an attempt to tackle this problem, we propose discriminative pooling, based on the notion that among the deep features generated on all short clips, there is at least one that characterizes the action.

Action Recognition In Videos Multiple Instance Learning +2

Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective

no code implementations CVPR 2018 Suryansh Kumar, Anoop Cherian, Yuchao Dai, Hongdong Li

To address these issues, in this paper, we propose a new approach for dense NRSfM by modeling the problem on a Grassmann manifold.

Neural Algebra of Classifiers

no code implementations26 Jan 2018 Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould

In this paper, we build on the compositionality principle and develop an "algebra" to compose classifiers for complex visual concepts.

Human Action Forecasting by Learning Task Grammars

no code implementations19 Sep 2017 Tengda Han, Jue Wang, Anoop Cherian, Stephen Gould

For effective human-robot interaction, it is important that a robotic assistant can forecast the next action a human will consider in a given task.

Action Recognition Temporal Action Localization

Human Pose Forecasting via Deep Markov Models

no code implementations24 Jul 2017 Sam Toyer, Anoop Cherian, Tengda Han, Stephen Gould

Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving.

Autonomous Driving Human Pose Forecasting

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

no code implementations24 May 2017 Anoop Cherian, Suvrit Sra, Richard Hartley

As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in an RKHS, projections of data onto which captures their temporal order.

Action Recognition Riemannian optimization +3

Second-order Temporal Pooling for Action Recognition

no code implementations23 Apr 2017 Anoop Cherian, Stephen Gould

We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space.

Action Recognition Temporal Action Localization

Generalized Rank Pooling for Activity Recognition

no code implementations CVPR 2017 Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould

Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity.

Action Recognition Riemannian optimization +1

Action Representation Using Classifier Decision Boundaries

no code implementations6 Apr 2017 Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould

Applying multiple instance learning in an SVM setup, we use the parameters of this separating hyperplane as a descriptor for the video.

Action Recognition Multiple Instance Learning +1

Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition

no code implementations19 Jan 2017 Anoop Cherian, Piotr Koniusz, Stephen Gould

The HOK descriptors are then generated from the higher-order co-occurrences of these feature maps, and are then used as input to a video-level classifier.

Fine-grained Action Recognition Object Recognition +1

Ordered Pooling of Optical Flow Sequences for Action Recognition

no code implementations12 Jan 2017 Jue Wang, Anoop Cherian, Fatih Porikli

Training of Convolutional Neural Networks (CNNs) on long video sequences is computationally expensive due to the substantial memory requirements and the massive number of parameters that deep architectures demand.

Action Recognition Optical Flow Estimation +1

Sparse Coding for Third-Order Super-Symmetric Tensor Descriptors With Application to Texture Recognition

no code implementations CVPR 2016 Piotr Koniusz, Anoop Cherian

Super-symmetric tensors - a higher-order extension of scatter matrices - are becoming increasingly popular in machine learning and computer vision for modeling data statistics, co-occurrences, or even as visual descriptors.

Dictionary Learning

Dictionary Learning and Sparse Coding for Third-order Super-symmetric Tensors

no code implementations9 Sep 2015 Piotr Koniusz, Anoop Cherian

Super-symmetric tensors - a higher-order extension of scatter matrices - are becoming increasingly popular in machine learning and computer vision for modelling data statistics, co-occurrences, or even as visual descriptors.

Dictionary Learning

Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices

no code implementations10 Jul 2015 Anoop Cherian, Suvrit Sra

Inspired by the great success of dictionary learning and sparse coding for vector-valued data, our goal in this paper is to represent data in the form of SPD matrices as sparse conic combinations of SPD atoms from a learned dictionary via a Riemannian geometric approach.

BIG-bench Machine Learning Dictionary Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.