Search Results for author: Dinesh Manocha

Found 184 papers, 65 papers with code

DocTime: A Document-level Temporal Dependency Graph Parser

no code implementations NAACL 2022 Puneet Mathur, Vlad Morariu, Verena Kaynig-Fittkau, Jiuxiang Gu, Franck Dernoncourt, Quan Tran, Ani Nenkova, Dinesh Manocha, Rajiv Jain

We introduce DocTime - a novel temporal dependency graph (TDG) parser that takes as input a text document and produces a temporal dependency graph.

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

no code implementations1 Jul 2024 Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Jun Chen, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha

Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio.

Fact Checking Language Modelling +2

Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs

no code implementations26 Jun 2024 Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha

To enhance the plausibility of synthesis, we use an adversarial discriminator that learns to differentiate between the face and pose motions computed from the original videos and our synthesized motions based on their affective expressions.

IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning

no code implementations19 Jun 2024 Soumya Suvra Ghosal, Samyadeep Basu, Soheil Feizi, Dinesh Manocha

Notably, in a 16-shot setup, IntCoOp improves CoOp by 7. 35% in average performance across 10 diverse datasets.

Attribute Few-Shot Learning +1

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

no code implementations17 Jun 2024 Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

Next, we propose CompA-R (Instruction-Tuning for Complex Audio Reasoning), a synthetically generated instruction-tuning (IT) dataset with instructions that require the model to perform complex reasoning on the input audio.

Audio Question Answering Instruction Following +2

Embodied Question Answering via Multi-LLM Systems

no code implementations16 Jun 2024 Bhrij Patel, Vishnu Sashank Dorbala, Dinesh Manocha, Amrit Singh Bedi

Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries.

Embodied Question Answering Feature Importance +1

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

no code implementations CVPR 2024 Sanjoy Chowdhury, Sayan Nag, K J Joseph, Balaji Vasan Srinivasan, Dinesh Manocha

MeLFusion is a text-to-music diffusion model with a novel "visual synapse", which effectively infuses the semantics from the visual modality into the generated music.

FAD

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

1 code implementation6 Jun 2024 Sreyan Ghosh, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi, Ramani Duraiswami, Dinesh Manocha

Instead of learning the cross-modal correlation between the audio and visual modalities, we make an LLM learn the task of visually-conditioned (generative) ASR error correction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

1 code implementation6 Jun 2024 Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, C. K. Evuru, S Ramaneswaran, S Sakshi, Dinesh Manocha

We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks.

Data Augmentation Diversity +1

Transfer Q Star: Principled Decoding for LLM Alignment

no code implementations30 May 2024 Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Singh Bedi, Furong Huang

Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{\pi_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance.

EM-GANSim: Real-time and Accurate EM Simulation Using Conditional GANs for 3D Indoor Scenes

no code implementations27 May 2024 Ruichen Wang, Dinesh Manocha

We present a novel machine-learning (ML) approach (EM-GANSim) for real-time electromagnetic (EM) propagation that is used for wireless communication simulation in 3D indoor environments.

Generative Adversarial Network

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

no code implementations24 May 2024 Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha

From our analysis, we show that: (1) The community's efforts have been primarily targeted towards reducing hallucinations related to visual recognition (VR) prompts (e. g., prompts that only require describing the image), thereby ignoring hallucinations for cognitive prompts (e. g., prompts that require additional skills like reasoning on contents of the image).

Hallucination

Prompt Mixing in Diffusion Models using the Black Scholes Algorithm

1 code implementation22 May 2024 Divya Kothandaraman, Ming Lin, Dinesh Manocha

We introduce a novel approach for prompt mixing, aiming to generate images at the intersection of multiple text prompts using pre-trained text-to-image diffusion models.

Denoising

S-EQA: Tackling Situational Queries in Embodied Question Answering

no code implementations8 May 2024 Vishnu Sashank Dorbala, Prasoon Goyal, Robinson Piramuthu, Michael Johnston, Dinesh Manocha, Reza Ghanadhan

To the best of our knowledge, this is the first work to introduce EQA with situational queries, and also the first to use a generative approach for query creation.

Embodied Question Answering Question Answering +3

LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

no code implementations8 May 2024 Tianrui Guan, Yurou Yang, Harry Cheng, Muyuan Lin, Richard Kim, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes.

Language Modelling Object +1

TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes

no code implementations4 May 2024 Christopher Maxey, Jaehoon Choi, Yonghan Lee, Hyungtae Lee, Dinesh Manocha, Heesung Kwon

In this paper, we present a new approach to bridge the domain gap between synthetic and real-world data for un- manned aerial vehicle (UAV)-based perception.

Decoder

AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

no code implementations4 Apr 2024 Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha

We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps.

PoCo: Point Context Cluster for RGBD Indoor Place Recognition

no code implementations3 Apr 2024 Jing Liang, Zhuo Deng, Zheming Zhou, Omid Ghasemalizadeh, Dinesh Manocha, Min Sun, Cheng-Hao Kuo, Arnie Sen

We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database.

Do Vision-Language Models Understand Compound Nouns?

1 code implementation30 Mar 2024 Sonal Kumar, Sreyan Ghosh, S Sakshi, Utkarsh Tyagi, Dinesh Manocha

We curate Compun, a novel benchmark with 400 unique and commonly used CNs, to evaluate the effectiveness of VLMs in interpreting CNs.

Image Retrieval Language Modelling +2

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP

1 code implementation30 Mar 2024 Chandra Kiran Reddy Evuru, Sreyan Ghosh, Sonal Kumar, Ramaneswaran S, Utkarsh Tyagi, Dinesh Manocha

We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP.

Data Augmentation Instruction Following

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

no code implementations18 Mar 2024 Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods.

Policy Gradient Methods

Right Place, Right Time! Towards ObjectNav for Non-Stationary Goals

no code implementations14 Mar 2024 Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha

We address this concern by inferring results on two cases for object placement: one where the objects placed follow a routine or a path, and the other where they are placed at random.

Object Visual Grounding

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

no code implementations13 Mar 2024 Peihong Yu, Manav Mishra, Alec Koppel, Carl Busart, Priya Narayan, Dinesh Manocha, Amrit Bedi, Pratap Tokekar

Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space.

Efficient Exploration Multi-agent Reinforcement Learning +1

Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

no code implementations15 Feb 2024 Xiyang Wu, Souradip Chakraborty, Ruiqi Xian, Jing Liang, Tianrui Guan, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Singh Bedi

In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.

Language Modelling

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

no code implementations14 Feb 2024 Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.

Diversity Fairness +1

A Closer Look at the Limitations of Instruction Tuning

no code implementations3 Feb 2024 Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Ramaneswaran S, Deepali Aneja, Zeyu Jin, Ramani Duraiswami, Dinesh Manocha

Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.

Hallucination

LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering

no code implementations CVPR 2024 Jaehoon Choi, Rajvi Shah, Qinbo Li, Yipeng Wang, Ayush Saraf, Changil Kim, Jia-Bin Huang, Dinesh Manocha, Suhib Alsisan, Johannes Kopf

We validate the effectiveness of the proposed method on large unbounded scenes from mip-NeRF 360 Tanks & Temples and Deep Blending datasets achieving at-par rendering quality with 73x reduced triangles and 11x reduction in memory footprint.

REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback

no code implementations22 Dec 2023 Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi

Current methods to mitigate this misalignment work by learning reward functions from human preferences; however, they inadvertently introduce a risk of reward overoptimization.

Bilevel Optimization Continuous Control +2

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

1 code implementation20 Dec 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

1 code implementation20 Dec 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online.

Domain Adaptation Self-Supervised Learning

APoLLo: Unified Adapter and Prompt Learning for Vision Language Models

1 code implementation4 Dec 2023 Sanjoy Chowdhury, Sayan Nag, Dinesh Manocha

Our method is designed to substantially improve the generalization capabilities of VLP models when they are fine-tuned in a few-shot setting.

AV-RIR: Audio-Visual Room Impulse Response Estimation

no code implementations CVPR 2024 Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar, Purva Chiniya, Dinesh Manocha

We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and the visual cues of its corresponding environment.

Multi-Task Learning Room Impulse Response (RIR) +1

HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View

2 code implementations27 Nov 2023 Divya Kothandaraman, Tianyi Zhou, Ming Lin, Dinesh Manocha

It seamlessly blends the visual features from the input image within a pretrained text-to-2Dimage stable diffusion model with a test-time optimization process for a careful bias-variance trade-off, which uses an Inverse Perspective Mapping (IPM) homography transformation to provide subtle cues for aerialview synthesis.

Novel View Synthesis

UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception

no code implementations25 Oct 2023 Christopher Maxey, Jaehoon Choi, Hyungtae Lee, Dinesh Manocha, Heesung Kwon

Using various synthetic renderers in conjunction with perception models is prevalent to create synthetic data to augment the learning in the ground-based imaging domain.

Data Augmentation Image Generation +2

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

no code implementations23 Oct 2023 Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

But in parallel to the development of detection frameworks, researchers have also concentrated on designing strategies to elude detection, i. e., focusing on the impossibilities of AI-generated text detection.

Misinformation Text Detection

Indoor Wireless Signal Modeling with Smooth Surface Diffraction Effects

no code implementations16 Oct 2023 Ruichen Wang, Samuel Audia, Dinesh Manocha

We present a novel algorithm that enhances the accuracy of electromagnetic field simulations in indoor environments by incorporating the Uniform Geometrical Theory of Diffraction (UTD) for surface diffraction.

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

no code implementations12 Oct 2023 Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, Ramaneswaran S, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs.

Attribute Audio Classification +1

RECAP: Retrieval-Augmented Audio Captioning

1 code implementation18 Sep 2023 Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha

We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore.

AudioCaps Caption Generation +3

VAPOR: Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning

1 code implementation14 Sep 2023 Kasun Weerakoon, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Dinesh Manocha

We present VAPOR, a novel method for autonomous legged robot navigation in unstructured, densely vegetated outdoor environments using offline Reinforcement Learning (RL).

Offline RL reinforcement-learning +2

AdVerb: Visually Guided Audio Dereverberation

no code implementations ICCV 2023 Sanjoy Chowdhury, Sreyan Ghosh, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha

We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio.

Speaker Verification Speech Enhancement +2

ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations

1 code implementation19 Aug 2023 Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Sakshi Singh, Sanjoy Chowdhury, Dinesh Manocha

Precisely, we employ LLMs to first extract foreground and background features from textual descriptions of an image, followed by advanced language-guided image editing to discover the features that are spuriously correlated with the class label.

Classification Data Augmentation +2

PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

no code implementations3 Aug 2023 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback.

Bilevel Optimization Procedure Learning +2

LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference

no code implementations ICCV 2023 Cong Wang, Yu-Ping Wang, Dinesh Manocha

We demonstrate the effectiveness of our approach and generate state-of-the-art results on different datasets.

Human Trajectory Forecasting with Explainable Behavioral Uncertainty

no code implementations4 Jul 2023 Jiangbei Yue, Dinesh Manocha, He Wang

Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well.

Self-Driving Cars Trajectory Forecasting

Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

no code implementations9 Jun 2023 Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications.

Policy Gradient Methods reinforcement-learning +1

PLAR: Prompt Learning for Action Recognition

no code implementations21 May 2023 Xijun Wang, Ruiqi Xian, Tianrui Guan, Dinesh Manocha

We evaluate our approach on datasets consisting of both ground camera videos and aerial videos, and scenes with single-agent and multi-agent actions.

Action Recognition Optical Flow Estimation

BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER

1 code implementation18 May 2023 Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, Dinesh Manocha

Though data augmentation has shown to be highly effective for low-resource NER in general, existing data augmentation techniques fail to produce factual and diverse augmentations for BioNER.

Data Augmentation named-entity-recognition +2

On the Possibilities of AI-Generated Text Detection

no code implementations10 Apr 2023 Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications.

Text Detection

PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments

no code implementations24 Mar 2023 James Mullen, Dinesh Manocha

We compare our method with prior motion generating techniques and highlight the benefits of our method with a perceptual study and physical plausibility metrics.

Motion Synthesis

Dynamic EM Ray Tracing for Large Urban Scenes with Multiple Receivers

no code implementations19 Mar 2023 Ruichen Wang, Dinesh Manocha

We present a novel ray tracing-based radio propagation algorithm that can handle large urban scenes with hundreds or thousands of dynamic objects and receivers.

Blocking

Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances

1 code implementation17 Mar 2023 Arun V. Reddy, Ketul Shah, William Paul, Rohita Mocharla, Judy Hoffman, Kapil D. Katyal, Dinesh Manocha, Celso M. de Melo, Rama Chellappa

The dataset is composed of both real and synthetic videos from seven gesture classes, and is intended to support the study of synthetic-to-real domain shift for video-based action recognition.

Action Recognition Domain Adaptation +1

RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback

no code implementations14 Mar 2023 Souradip Chakraborty, Kasun Weerakoon, Prithvi Poddar, Mohamed Elnoor, Priya Narayanan, Carl Busart, Pratap Tokekar, Amrit Singh Bedi, Dinesh Manocha

Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures.

Continuous Control Zero-Shot Learning

UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

1 code implementation10 Mar 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step.

Audio Classification Self-Supervised Learning

Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration for Zero-Shot Object Navigation

1 code implementation6 Mar 2023 Vishnu Sashank Dorbala, James F. Mullen Jr., Dinesh Manocha

We present LGX (Language-guided Exploration), a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON), where an embodied agent navigates to a uniquely described target object in a previously unseen environment.

Motion Planning Object +3

CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network

1 code implementation2 Mar 2023 Sreyan Ghosh, Manan Suri, Purva Chiniya, Utkarsh Tyagi, Sonal Kumar, Dinesh Manocha

The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics.

Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

1 code implementation2 Feb 2023 Anton Ratnarajah, Dinesh Manocha

We propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for indoor 3D models of real environments.

Generative Adversarial Network Graph Neural Network

Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

no code implementations28 Jan 2023 Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection.

Reinforcement Learning (RL)

LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents

no code implementations IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023 Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Ani Nenkova, Dinesh Manocha, Vlad I. Morariu

Experiments show that our approach outperforms competitive baselines by 10-15% on three diverse datasets of forms and mobile app screen layouts for the tasks of spatial region classification, higher-order group identification, layout hierarchy extraction, reading order detection, and word grouping.

Reading Order Detection

Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

no code implementations10 Dec 2022 Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha

We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets.

Speech Dereverberation

Towards Improved Room Impulse Response Estimation for Speech Recognition

no code implementations8 Nov 2022 Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia

We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

1 code implementation2 Nov 2022 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification.

Audio Classification Clustering +4

MAST: Multiscale Audio Spectrogram Transformers

1 code implementation2 Nov 2022 Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha

We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST).

Audio Classification Keyword Spotting +1

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

no code implementations19 Sep 2022 Aaron M. Roth, Jing Liang, Ram Sriram, Elham Tabassi, Dinesh Manocha

Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining.

Imitation Learning reinforcement-learning +2

Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

no code implementations15 Sep 2022 Divya Kothandaraman, Ming Lin, Dinesh Manocha

We build a differentiable static-dynamic frequency mask prior to model the salient static and dynamic pixels in the video, crucial for the underlying task of action recognition.

Action Recognition Activity Recognition In Videos +2

Placing Human Animations into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes

no code implementations13 Sep 2022 James F. Mullen Jr, Divya Kothandaraman, Aniket Bera, Dinesh Manocha

We compare our method, which we call PAAK, with prior approaches, including POSA, PROX ground truth, and a motion synthesis method, and highlight the benefits of our method with a perceptual study.

Motion Synthesis

DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments

no code implementations7 Sep 2022 Aakriti Agrawal, Senthil Hariharan, Amrit Singh Bedi, Dinesh Manocha

At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD).

Reinforcement Learning (RL)

Vision-Centric BEV Perception: A Survey

1 code implementation4 Aug 2022 Yuexin Ma, Tai Wang, Xuyang Bai, Huitong Yang, Yuenan Hou, Yaming Wang, Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu

In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant interest from both industry and academia due to its inherent advantages, such as providing an intuitive representation of the world and being conducive to data fusion.

A Repulsive Force Unit for Garment Collision Handling in Neural Networks

no code implementations28 Jul 2022 Qingyang Tan, Yi Zhou, Tuanfeng Wang, Duygu Ceylan, Xin Sun, Dinesh Manocha

Despite recent success, deep learning-based methods for predicting 3D garment deformation under body motion suffer from interpenetration problems between the garment and the body.

Human Trajectory Prediction via Neural Social Physics

1 code implementation21 Jul 2022 Jiangbei Yue, Dinesh Manocha, He Wang

Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters.

Inductive Bias Trajectory Prediction

D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights

1 code implementation21 Jul 2022 Yuzhen Zhang, Wentong Wang, Weizhi Guo, Pei Lv, Mingliang Xu, Wei Chen, Dinesh Manocha

We present a trajectory prediction approach with respect to traffic lights, D2-TPred, which uses a spatial dynamic interaction graph (SDG) and a behavior dependency graph (BDG) to handle the problem of discontinuous dependency in the spatial-temporal space.

Trajectory Prediction

Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention

no code implementations18 Jul 2022 Uttaran Bhattacharya, Gang Wu, Stefano Petrangeli, Viswanathan Swaminathan, Dinesh Manocha

We propose a method to detect individualized highlights for users on given target videos based on their preferred highlight clips marked on previous videos they have watched.

Highlight Detection

FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

no code implementations22 Jun 2022 Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program.

Federated Learning

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

no code implementations12 Jun 2022 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, Dinesh Manocha

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.

Continuous Control OpenAI Gym

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

no code implementations2 Jun 2022 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time.

Continuous Control Model-based Reinforcement Learning +2

SALAD: Source-free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection

1 code implementation24 May 2022 Divya Kothandaraman, Sumit Shekhar, Abhilasha Sancheti, Manoj Ghuhan, Tripti Shukla, Dinesh Manocha

SALAD has three key benefits: (i) it is task-agnostic, and can be applied across various visual tasks such as classification, segmentation and detection; (ii) it can handle shifts in output label space from the pre-trained source network to the target domain; (iii) it does not require access to source data for adaptation.

Active Learning Domain Adaptation +2

MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

2 code implementations18 May 2022 Anton Ratnarajah, Zhenyu Tang, Rohith Chandrashekar Aralikatti, Dinesh Manocha

We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.

2k Speech Dereverberation +1

Predicting Loose-Fitting Garment Deformations Using Bone-Driven Motion Networks

1 code implementation3 May 2022 Xiaoyu Pan, Jiaming Mai, Xinwei Jiang, Dongxue Tang, Jingxiang Li, Tianjia Shao, Kun Zhou, Xiaogang Jin, Dinesh Manocha

We present a learning algorithm that uses bone-driven motion networks to predict the deformation of loose-fitting garment meshes at interactive rates.

STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes

1 code implementation CVPR 2022 Peishan Cong, Xinge Zhu, Feng Qiao, Yiming Ren, Xidong Peng, Yuenan Hou, Lan Xu, Ruigang Yang, Dinesh Manocha, Yuexin Ma

In addition, considering the property of sparse global distribution and density-varying local distribution of pedestrians, we further propose a novel method, Density-aware Hierarchical heatmap Aggregation (DHA), to enhance pedestrian perception in crowded scenes.

Pedestrian Detection Sensor Fusion

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

1 code implementation31 Mar 2022 Sreyan Ghosh, Utkarsh Tyagi, S Ramaneswaran, Harshvardhan Srivastava, Dinesh Manocha

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition.

Ranked #3 on Multimodal Emotion Recognition on IEMOCAP (Weighted Accuracy (WA) metric, using extra training data)

Multimodal Emotion Recognition Multi-Task Learning +1

3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

no code implementations CVPR 2022 Vikram Gupta, Trisha Mittal, Puneet Mathur, Vaibhav Mishra, Mayank Maheshwari, Aniket Bera, Debdoot Mukherjee, Dinesh Manocha

We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj.

FAR: Fourier Aerial Video Recognition

1 code implementation21 Mar 2022 Divya Kothandaraman, Tianrui Guan, Xijun Wang, Sean Hu, Ming Lin, Dinesh Manocha

Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent (which is typically small) from the background.

Action Recognition Disentanglement +1

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

no code implementations10 Mar 2022 Jaehoon Choi, Dongki Jung, Yonghan Lee, Deokhwa Kim, Dinesh Manocha, Donghwan Lee

Given these metric poses and monocular sequences, we propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.

Monocular Depth Estimation Robot Navigation +2

Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models

no code implementations16 Feb 2022 Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha, Ram D. Sriram

Experimental results indicate that both audio and text-based models improve the emotion recognition performance and that the proposed multimodal solution achieves state-of-the-art results on the IEMOCAP benchmark.

Data Augmentation Emotional Intelligence +3

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks

no code implementations13 Dec 2021 Yudi Li, Min Tang, Yun Yang, Zi Huang, Ruofeng Tong, Shuangcai Yang, Yao Li, Dinesh Manocha

We present a novel mesh-based learning approach (N-Cloth) for plausible 3D cloth deformation prediction.

FAST-RIR: Fast neural diffuse room impulse response generator

2 code implementations7 Oct 2021 Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

HighlightMe: Detecting Highlights from Human-Centric Videos

no code implementations ICCV 2021 Uttaran Bhattacharya, Gang Wu, Stefano Petrangeli, Viswanathan Swaminathan, Dinesh Manocha

We train our network to map the activity- and interaction-based latent structural representations of the different modalities to per-frame highlight scores based on the representativeness of the frames.

MotionHint: Self-Supervised Monocular Visual Odometry with Motion Constraints

1 code implementation14 Sep 2021 Cong Wang, Yu-Ping Wang, Dinesh Manocha

A key aspect of our approach is to use an appropriate motion model that can help existing self-supervised monocular VO (SSM-VO) algorithms to overcome issues related to the local minima within their self-supervised loss functions.

Monocular Visual Odometry

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

no code implementations ICCV 2021 Dongki Jung, Jaehoon Choi, Yonghan Lee, Deokhwa Kim, Changick Kim, Dinesh Manocha, Donghwan Lee

We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e. g., a department store or a metro station.

3D Reconstruction Depth Estimation

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

no code implementations5 Aug 2021 Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha, Ram D. Sriram

Automatic speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.

Emotion Classification Speaker Recognition +2

Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning

1 code implementation31 Jul 2021 Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, Dinesh Manocha

Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences.

Generative Adversarial Network Gesture Generation

M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

1 code implementation24 Apr 2021 Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha

We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.

3D Object Detection object-detection +1

XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision Trees

1 code implementation22 Apr 2021 Aaron M. Roth, Jing Liang, Dinesh Manocha

In order to increase the reliability and handle the failure cases of the expert policy, we combine with a policy extraction technique to transform the resulting policy into a decision tree format.

Explainable Artificial Intelligence (XAI) Robot Navigation

Scene-aware Far-field Automatic Speech Recognition

no code implementations21 Apr 2021 Zhenyu Tang, Dinesh Manocha

We use a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its speech samples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Robust 2D/3D Vehicle Parsing in CVIS

no code implementations11 Mar 2021 Hui Miao, Feixiang Lu, Zongdai Liu, Liangjun Zhang, Dinesh Manocha, Bin Zhou

We combine these novel algorithms and datasets to develop a robust approach for 2D/3D vehicle parsing for CVIS.

Benchmarking Data Augmentation +4

GANav: Efficient Terrain Segmentation for Robot Navigation in Unstructured Outdoor Environments

1 code implementation7 Mar 2021 Tianrui Guan, Divya Kothandaraman, Rohan Chandra, Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Dinesh Manocha

We interface GANav with a deep reinforcement learning-based navigation algorithm and highlight its benefits in terms of navigation in real-world unstructured terrains.

Robot Navigation Semantic Segmentation

Dynamic Graph Modeling of Simultaneous EEG and Eye-tracking Data for Reading Task Identification

no code implementations21 Feb 2021 Puneet Mathur, Trisha Mittal, Dinesh Manocha

We present a new approach, that we call AdaGTCN, for identifying human reader intent from Electroencephalogram~(EEG) and Eye movement~(EM) data in order to help differentiate between normal reading and task-oriented reading.

EEG Graph Learning

Example-based Real-time Clothing Synthesis for Virtual Agents

no code implementations8 Jan 2021 Nannan Wu, Qianwen Chao, Yanzhen Chen, Weiwei Xu, Chen Liu, Dinesh Manocha, Wenxin Sun, Yi Han, Xinran Yao, Xiaogang Jin

Given a query shape and pose of the virtual agent, we synthesize the resulting clothing deformation by blending the Taylor expansion results of nearby anchoring points.

Graphics

Fast 3D Acoustic Scattering via Discrete Laplacian Based Implicit Function Encoders

no code implementations1 Jan 2021 Hsien-Yu Meng, Zhenyu Tang, Dinesh Manocha

Acoustic properties of objects corresponding to scattering characteristics are frequently used for 3D audio content creation, environmental acoustic effects, localization and acoustic scene analysis, etc.

Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

1 code implementation15 Dec 2020 Feixiang Lu, Zongdai Liu, Hui Miao, Peng Wang, Liangjun Zhang, Ruigang Yang, Dinesh Manocha, Bin Zhou

For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensuring the safety of the self-driving vehicle.

Autonomous Driving Data Augmentation +3

Sound Synthesis, Propagation, and Rendering: A Survey

no code implementations11 Nov 2020 Shiguang Liu, Dinesh Manocha

To the best of our knowledge, this is the first attempt to provide a comprehensive summary of sound research in the field of computer graphics.

Sound Graphics

B-GAP: Behavior-Rich Simulation and Navigation for Autonomous Driving

3 code implementations7 Nov 2020 Angelos Mavrogiannis, Rohan Chandra, Dinesh Manocha

We address the problem of ego-vehicle navigation in dense simulated traffic environments populated by road agents with varying driver behaviors.

Robotics

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

no code implementations19 Oct 2020 Sarala Padi, Dinesh Manocha, Ram D. Sriram

MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal.

Data Augmentation Speech Emotion Recognition

Multi-Agent Coverage in Urban Environments

1 code implementation17 Aug 2020 Shivang Patel, Senthil Hariharan, Pranav Dhulipala, Ming C Lin, Dinesh Manocha, Huan Xu, Michael Otte

We study multi-agent coverage algorithms for autonomous monitoring and patrol in urban environments.

Robotics

PerMO: Perceiving More at Once from a Single Image for Autonomous Driving

no code implementations16 Jul 2020 Feixiang Lu, Zongdai Liu, Xibin Song, Dingfu Zhou, Wei Li, Hui Miao, Miao Liao, Liangjun Zhang, Bin Zhou, Ruigang Yang, Dinesh Manocha

We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image for autonomous driving.

3D Reconstruction Autonomous Driving +3

OF-VO: Efficient Navigation among Pedestrians Using Commodity Sensors

no code implementations23 Apr 2020 Jing Liang, Yi-Ling Qiao, Dinesh Manocha

Overall, our OF-VO algorithm using learning-based perception and model-based planning methods offers better performance than prior algorithms in terms of navigation time and success rate of collision avoidance.

Robotics

Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using Affective Cues

no code implementations14 Mar 2020 Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha

Additionally, we extract and compare affective cues corresponding to perceived emotion from the two modalities within a video to infer whether the input video is "real" or "fake".

DeepFake Detection Face Swapping

ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for Socially-Aware Robot Navigation

1 code implementation2 Mar 2020 Venkatraman Narayanan, Bala Murali Manoghar, Vishnu Sashank Dorbala, Dinesh Manocha, Aniket Bera

Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation taking into account social and proxemic constraints.

Emotion Classification Emotion Recognition +3

SPA: Verbal Interactions between Agents and Avatars in Shared Virtual Environments using Propositional Planning

no code implementations8 Feb 2020 Andrew Best, Sahil Narang, Dinesh Manocha

We present a novel approach for generating plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments.

Single Particle Analysis

Deep Differentiable Grasp Planner for High-DOF Grippers

no code implementations4 Feb 2020 Min Liu, Zherong Pan, Kai Xu, Kanishka Ganguly, Dinesh Manocha

We present an end-to-end algorithm for training deep neural networks to grasp novel objects.

Robotics

The Liar's Walk: Detecting Deception with Gait and Gesture

no code implementations14 Dec 2019 Tanmay Randhavane, Uttaran Bhattacharya, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh Manocha

We present a data-driven deep neural algorithm for detecting deceptive walking behavior using nonverbal cues like gaits and gestures.

Action Classification

Reinforcement Learning-based Visual Navigation with Information-Theoretic Regularization

1 code implementation9 Dec 2019 Qiaoyun Wu, Kai Xu, Jun Wang, Mingliang Xu, Dinesh Manocha

The regularization maximizes the mutual information between navigation actions and visual observation transforms of an agent, thus promoting more informed navigation decisions.

Robotics

Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs

no code implementations arXiv 2019 Rohan Chandra, Tianrui Guan, Srujan Panuganti, Trisha Mittal, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha

In practice, our approach reduces the average prediction error by more than 54% over prior algorithms and achieves a weighted average accuracy of 91. 2% for behavior prediction.

Robotics

Scene-Aware Audio Rendering via Deep Acoustic Analysis

no code implementations14 Nov 2019 Zhenyu Tang, Nicholas J. Bryan, DIngzeyu Li, Timothy R. Langlois, Dinesh Manocha

We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models.

Sound Graphics Multimedia Audio and Speech Processing

M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

no code implementations9 Nov 2019 Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha

Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities.

Multimodal Emotion Recognition

Personality-Aware Probabilistic Map for Trajectory Prediction of Pedestrians

no code implementations1 Nov 2019 Chaochao Li, Pei Lv, Mingliang Xu, Xinyu Wang, Dinesh Manocha, Bing Zhou, Meng Wang

We update this map dynamically based on the agents in the environment and prior trajectory of a pedestrian.

Trajectory Prediction

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

1 code implementation28 Oct 2019 Uttaran Bhattacharya, Trisha Mittal, Rohan Chandra, Tanmay Randhavane, Aniket Bera, Dinesh Manocha

We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE).

General Classification

Learning Resilient Behaviors for Navigation Under Uncertainty

no code implementations22 Oct 2019 Tingxiang Fan, Pinxin Long, Wenxi Liu, Jia Pan, Ruigang Yang, Dinesh Manocha

Deep reinforcement learning has great potential to acquire complex, adaptive behaviors for autonomous agents automatically.

Autonomous Driving

DeepMNavigate: Deep Reinforced Multi-Robot Navigation Unifying Local & Global Collision Avoidance

no code implementations4 Oct 2019 Qingyang Tan, Tingxiang Fan, Jia Pan, Dinesh Manocha

We present a novel algorithm (DeepMNavigate) for global multi-agent navigation in dense scenarios using deep reinforcement learning (DRL).

Collision Avoidance Position +3

Realtime Simulation of Thin-Shell Deformable Materials using CNN-Based Mesh Embedding

no code implementations26 Sep 2019 Qingyang Tan, Zherong Pan, Lin Gao, Dinesh Manocha

We present a new algorithm to embed a high-dimensional configuration space of deformable objects in a low-dimensional feature space, where the configurations of objects and feature points have approximate one-to-one mapping.

Dimensionality Reduction Robot Manipulation

RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs

1 code implementation20 Jul 2019 Rohan Chandra, Uttaran Bhattacharya, Christian Roncal, Aniket Bera, Dinesh Manocha

RobustTP is an approach that first computes trajectories using a combination of a non-linear motion model and a deep learning-based instance segmentation algorithm.

Robotics

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

no code implementations9 Jul 2019 Zhenyu Tang, Lian-Wu Chen, Bo Wu, Dong Yu, Dinesh Manocha

We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks.

BIG-bench Machine Learning Keyword Spotting +2

FVA: Modeling Perceived Friendliness of Virtual Agents Using Movement Characteristics

no code implementations30 Jun 2019 Tanmay Randhavane, Aniket Bera, Kyra Kapsaskis, Kurt Gray, Dinesh Manocha

We also investigate the perception of a user in an AR setting and observe that an FVA has a statistically significant improvement in terms of the perceived friendliness and social presence of a user compared to an agent without the friendliness modeling.

RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

1 code implementation25 Jun 2019 Rohan Chandra, Uttaran Bhattacharya, Tanmay Randhavane, Aniket Bera, Dinesh Manocha

We present a realtime tracking algorithm, RoadTrack, to track heterogeneous road-agents in dense traffic videos.

Robotics

NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations

1 code implementation17 Jun 2019 Qiaoyun Wu, Dinesh Manocha, Jun Wang, Kai Xu

First, the latent distribution is conditioned on current observations and the target view, leading to a model-based, target-driven navigation.

Visual Navigation

Identifying Emotions from Walking using Affective and Deep Features

no code implementations14 Jun 2019 Tanmay Randhavane, Uttaran Bhattacharya, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh Manocha

We also present an EWalk (Emotion Walk) dataset that consists of videos of walking individuals with gaits and labeled emotions.

Emotion Recognition

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

1 code implementation17 Apr 2019 Zhenyu Tang, John D. Kanu, Kevin Hogan, Dinesh Manocha

We present a novel learning-based approach to estimate the direction-of-arrival (DOA) of a sound source using a convolutional recurrent neural network (CRNN) trained via regression on synthetic data and Cartesian labels.

 Ranked #1 on Direction of Arrival Estimation on SOFA (using extra training data)

Direction of Arrival Estimation General Classification +1

PaintBot: A Reinforcement Learning Approach for Natural Media Painting

no code implementations3 Apr 2019 Biao Jia, Chen Fang, Jonathan Brandt, Byungmoon Kim, Dinesh Manocha

Action selection is guided by a given reference image, which the agent attempts to replicate subject to the limitations of the action space and the agent's learned policy.

reinforcement-learning Reinforcement Learning (RL)

Generating Grasp Poses for a High-DOF Gripper Using Neural Networks

no code implementations1 Mar 2019 Min Liu, Zherong Pan, Kai Xu, Kanishka Ganguly, Dinesh Manocha

The quality of the grasp poses is on par with the groundtruth poses in the dataset.

Robotics

AADS: Augmented Autonomous Driving Simulation using Data-driven Algorithms

1 code implementation23 Jan 2019 Wei Li, Chengwei Pan, Rong Zhang, Jiaping Ren, Yuexin Ma, Jin Fang, Feilong Yan, Qichuan Geng, Xinyu Huang, Huajun Gong, Weiwei Xu, Guoping Wang, Dinesh Manocha, Ruigang Yang

Our augmented approach combines the flexibility in a virtual environment (e. g., vehicle movements) with the richness of the real world to allow effective simulation of anywhere on earth.

Autonomous Driving

TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions

2 code implementations CVPR 2019 Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha

We evaluate the performance of our prediction algorithm, TraPHic, on the standard datasets and also introduce a new dense, heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories.

Trajectory Prediction Robotics