2 code implementations • 22 Aug 2024 • Haojun Shi, Suyu Ye, Xinyu Fang, Chuanyang Jin, Leyla Isik, Yen-Ling Kuo, Tianmin Shu
To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i. e., Theory of Mind reasoning in multi-agent interactions.
no code implementations • 18 Jul 2024 • Nathan Cloos, Meagan Jens, Michelangelo Naim, Yen-Ling Kuo, Ignacio Cases, Andrei Barbu, Christopher J. Cueva
Humans solve problems by following existing rules and procedures, and also by leaps of creativity to redefine those rules and objectives.
no code implementations • 10 Apr 2024 • Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang
We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input.
no code implementations • 10 Apr 2024 • Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang
Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important.
1 code implementation • 16 Jan 2024 • Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu
To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models).
1 code implementation • 21 Aug 2023 • Kunal Jha, Tuan Anh Le, Chuanyang Jin, Yen-Ling Kuo, Joshua B. Tenenbaum, Tianmin Shu
Multi-agent interactions, such as communication, teaching, and bluffing, often rely on higher-order social inference, i. e., understanding how others infer oneself.
no code implementations • 28 May 2023 • Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman
Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents.
no code implementations • CVPR 2024 • Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc van Gool, Otmar Hilliges, Xi Wang
This task requires an understanding of the spatio-temporal context formed by past actions on objects, coined action context.
no code implementations • 6 Sep 2022 • Xi Wang, Gen Li, Yen-Ling Kuo, Muhammed Kocabas, Emre Aksan, Otmar Hilliges
We further qualitatively evaluate the effectiveness of our method on real images and demonstrate its generalizability towards interaction types and object categories.
no code implementations • 19 Oct 2021 • Yen-Ling Kuo, Xin Huang, Andrei Barbu, Stephen G. McGill, Boris Katz, John J. Leonard, Guy Rosman
Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions.
no code implementations • 7 Aug 2020 • Christopher Wang, Candace Ross, Yen-Ling Kuo, Boris Katz, Andrei Barbu
We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execution of natural-language commands.
1 code implementation • Findings (EMNLP) 2021 • Yen-Ling Kuo, Boris Katz, Andrei Barbu
Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of networks.
1 code implementation • 1 Jun 2020 • Yen-Ling Kuo, Boris Katz, Andrei Barbu
We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions.
no code implementations • 8 Mar 2020 • Yu-Siang Wang, Yen-Ling Kuo, Boris Katz
We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation.
Multimodal Machine Translation Optical Character Recognition (OCR) +2
no code implementations • 12 Feb 2020 • Yen-Ling Kuo, Boris Katz, Andrei Barbu
We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipulate objects.