no code implementations • 17 Mar 2024 • Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency
We describe an approach for aligning an LLM-based dialogue agent based on global (i. e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals.
no code implementations • 21 May 2023 • Yubin Kim, Dong Won Lee, Paul Pu Liang, Sharifa Algohwinem, Cynthia Breazeal, Hae Won Park
Accurately modeling affect dynamics, which refers to the changes and fluctuations in emotions and affective displays during human conversations, is crucial for understanding human interactions.
no code implementations • 19 Apr 2023 • Dong Won Lee, Yubin Kim, Rosalind Picard, Cynthia Breazeal, Hae Won Park
As we move closer to real-world AI systems, AI agents must be able to deal with multiparty (group) conversations.
no code implementations • ICCV 2023 • Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency
We introduce three research tasks, (1) figure-to-text retrieval, (2) text-to-figure retrieval, and (3) generation of slide explanations, which are grounded in multimedia learning and psychology principles to test a vision-language model's understanding of multimodal content.
2 code implementations • 17 Aug 2022 • Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency
As a step toward developing AI to aid in student learning as intelligent teacher assistants, we introduce the Multimodal Lecture Presentations dataset as a large-scale benchmark testing the capabilities of machine learning models in multimodal understanding of educational content.
no code implementations • CVPR 2022 • Chaitanya Ahuja, Dong Won Lee, Louis-Philippe Morency
Personalizing an avatar for co-speech gesture generation from spoken language requires learning the idiosyncrasies of a person's gesture style from a small amount of data.
1 code implementation • ACM ICMI Workshop GENEA 2021 • Dong Won Lee, Chaitanya Ahuja, Louis-Philippe Morency
Crossmodal grounding is a key challenge for the task of generating relevant and well-timed gestures from just spoken language as an input.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, Louis-Philippe Morency
We study relationships between spoken language and co-speech gestures in context of two key challenges.
1 code implementation • ECCV 2020 • Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
A key challenge, called gesture style transfer, is to learn a model that generates these gestures for a speaking agent 'A' in the gesturing style of a target speaker 'B'.