Search Results for author: Yen-Ling Kuo

Found 15 papers, 5 papers with code

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

2 code implementations22 Aug 2024 Haojun Shi, Suyu Ye, Xinyu Fang, Chuanyang Jin, Leyla Isik, Yen-Ling Kuo, Tianmin Shu

To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i. e., Theory of Mind reasoning in multi-agent interactions.

Baba Is AI: Break the Rules to Beat the Benchmark

no code implementations18 Jul 2024 Nathan Cloos, Meagan Jens, Michelangelo Naim, Yen-Ling Kuo, Ignacio Cases, Andrei Barbu, Christopher J. Cueva

Humans solve problems by following existing rules and procedures, and also by leaps of creativity to redefine those rules and objectives.

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

no code implementations10 Apr 2024 Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input.

Action Anticipation Graph Neural Network +2

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

no code implementations10 Apr 2024 Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important.

Activity Recognition Gaze Prediction +1

MMToM-QA: Multimodal Theory of Mind Question Answering

1 code implementation16 Jan 2024 Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models).

Question Answering Theory of Mind Modeling

Neural Amortized Inference for Nested Multi-agent Reasoning

1 code implementation21 Aug 2023 Kunal Jha, Tuan Anh Le, Chuanyang Jin, Yen-Ling Kuo, Joshua B. Tenenbaum, Tianmin Shu

Multi-agent interactions, such as communication, teaching, and bluffing, often rely on higher-order social inference, i. e., understanding how others infer oneself.

Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors

no code implementations6 Sep 2022 Xi Wang, Gen Li, Yen-Ling Kuo, Muhammed Kocabas, Emre Aksan, Otmar Hilliges

We further qualitatively evaluate the effectiveness of our method on real images and demonstrate its generalizability towards interaction types and object categories.

Human-Object Interaction Detection Object

Trajectory Prediction with Linguistic Representations

no code implementations19 Oct 2021 Yen-Ling Kuo, Xin Huang, Andrei Barbu, Stephen G. McGill, Boris Katz, John J. Leonard, Guy Rosman

Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions.

Trajectory Prediction

Learning a natural-language to LTL executable semantic parser for grounded robotics

no code implementations7 Aug 2020 Christopher Wang, Candace Ross, Yen-Ling Kuo, Boris Katz, Andrei Barbu

We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execution of natural-language commands.

Sentence

Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

1 code implementation Findings (EMNLP) 2021 Yen-Ling Kuo, Boris Katz, Andrei Barbu

Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of networks.

Systematic Generalization

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

1 code implementation1 Jun 2020 Yen-Ling Kuo, Boris Katz, Andrei Barbu

We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions.

Multi-Task Learning Reinforcement Learning (RL) +1

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

no code implementations8 Mar 2020 Yu-Siang Wang, Yen-Ling Kuo, Boris Katz

We evaluate our look-ahead module on three datasets of varying difficulties: IM2LATEX-100k OCR image to LaTeX, WMT16 multimodal machine translation, and WMT14 machine translation.

Multimodal Machine Translation Optical Character Recognition (OCR) +2

Deep compositional robotic planners that follow natural language commands

no code implementations12 Feb 2020 Yen-Ling Kuo, Boris Katz, Andrei Barbu

We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipulate objects.

Cannot find the paper you are looking for? You can Submit a new open access paper.