Search Results for author: Chaitanya Ahuja

Found 12 papers, 7 papers with code

A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

no code implementations • 13 Jan 2023 • Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, Michael Neff

Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications.

Gesture Generation

Paper
Add Code

Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos

no code implementations • ICCV 2023 • Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency

We introduce three research tasks, (1) figure-to-text retrieval, (2) text-to-figure retrieval, and (3) generation of slide explanations, which are grounded in multimedia learning and psychology principles to test a vision-language model's understanding of multimodal content.

Attribute Retrieval +1

Paper
Add Code

Continual Learning for Personalized Co-speech Gesture Generation

no code implementations • ICCV 2023 • Chaitanya Ahuja, Pratik Joshi, Ryo Ishii, Louis-Philippe Morency

However in practical scenarios, speaker data comes sequentially and in small amounts as the agent personalizes with more speakers, akin to a continual learning paradigm.

Continual Learning Gesture Generation

Paper
Add Code

Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

2 code implementations • 17 Aug 2022 • Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency

As a step toward developing AI to aid in student learning as intelligent teacher assistants, we introduce the Multimodal Lecture Presentations dataset as a large-scale benchmark testing the capabilities of machine learning models in multimodal understanding of educational content.

Attribute

Paper
Code

Low-Resource Adaptation for Personalized Co-Speech Gesture Generation

no code implementations • CVPR 2022 • Chaitanya Ahuja, Dong Won Lee, Louis-Philippe Morency

Personalizing an avatar for co-speech gesture generation from spoken language requires learning the idiosyncrasies of a person's gesture style from a small amount of data.

Gesture Generation

Paper
Add Code

Crossmodal clustered contrastive learning: Grounding of spoken language to gesture

1 code implementation • ACM ICMI Workshop GENEA 2021 • Dong Won Lee, Chaitanya Ahuja, Louis-Philippe Morency

Crossmodal grounding is a key challenge for the task of generating relevant and well-timed gestures from just spoken language as an input.

Contrastive Learning

Paper
Code

No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, Louis-Philippe Morency

We study relationships between spoken language and co-speech gestures in context of two key challenges.

Gesture Generation

Paper
Code

Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach

1 code implementation • ECCV 2020 • Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency

A key challenge, called gesture style transfer, is to learn a model that generates these gestures for a speaking agent 'A' in the gesturing style of a target speaker 'B'.

Gesture Generation Style Transfer

Paper
Code

To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

3 code implementations • 5 Oct 2019 • Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh

In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar.

Paper
Code

Language2Pose: Natural Language Grounded Pose Forecasting

2 code implementations • 2 Jul 2019 • Chaitanya Ahuja, Louis-Philippe Morency

In this paper, we address this multimodal problem by introducing a neural architecture called Joint Language to Pose (or JL2P), which learns a joint embedding of language and pose.

Motion Planning

Paper
Code

Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling

1 code implementation • 6 Oct 2017 • Chaitanya Ahuja, Louis-Philippe Morency

We evaluate this family on new LRU models on computational convergence rates and statistical efficiency.

Paper
Code

Multimodal Machine Learning: A Survey and Taxonomy

no code implementations • 26 May 2017 • Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency

Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors.

BIG-bench Machine Learning Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.