no code implementations • 13 Jan 2023 • Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, Michael Neff
Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications.
no code implementations • ICCV 2023 • Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency
We introduce three research tasks, (1) figure-to-text retrieval, (2) text-to-figure retrieval, and (3) generation of slide explanations, which are grounded in multimedia learning and psychology principles to test a vision-language model's understanding of multimodal content.
no code implementations • ICCV 2023 • Chaitanya Ahuja, Pratik Joshi, Ryo Ishii, Louis-Philippe Morency
However in practical scenarios, speaker data comes sequentially and in small amounts as the agent personalizes with more speakers, akin to a continual learning paradigm.
2 code implementations • 17 Aug 2022 • Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency
As a step toward developing AI to aid in student learning as intelligent teacher assistants, we introduce the Multimodal Lecture Presentations dataset as a large-scale benchmark testing the capabilities of machine learning models in multimodal understanding of educational content.
no code implementations • CVPR 2022 • Chaitanya Ahuja, Dong Won Lee, Louis-Philippe Morency
Personalizing an avatar for co-speech gesture generation from spoken language requires learning the idiosyncrasies of a person's gesture style from a small amount of data.
1 code implementation • ACM ICMI Workshop GENEA 2021 • Dong Won Lee, Chaitanya Ahuja, Louis-Philippe Morency
Crossmodal grounding is a key challenge for the task of generating relevant and well-timed gestures from just spoken language as an input.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, Louis-Philippe Morency
We study relationships between spoken language and co-speech gestures in context of two key challenges.
1 code implementation • ECCV 2020 • Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
A key challenge, called gesture style transfer, is to learn a model that generates these gestures for a speaking agent 'A' in the gesturing style of a target speaker 'B'.
3 code implementations • 5 Oct 2019 • Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh
In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar.
2 code implementations • 2 Jul 2019 • Chaitanya Ahuja, Louis-Philippe Morency
In this paper, we address this multimodal problem by introducing a neural architecture called Joint Language to Pose (or JL2P), which learns a joint embedding of language and pose.
1 code implementation • 6 Oct 2017 • Chaitanya Ahuja, Louis-Philippe Morency
We evaluate this family on new LRU models on computational convergence rates and statistical efficiency.
no code implementations • 26 May 2017 • Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency
Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors.