Search Results for author: Jesse Thomason

Found 23 papers, 13 papers with code

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

1 code implementation ACL 2022 Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks.

Vision and Language Navigation

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

1 code implementation10 Nov 2021 Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme

However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.

Indoor Scene Synthesis Scene Generation

TEACh: Task-driven Embodied Agents that Chat

1 code implementation1 Oct 2021 Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes.

Dialogue Understanding

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion

1 code implementation10 Aug 2021 Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme

Language-guided robots performing home and office tasks must navigate in and interact with the world.

Language Grounding with 3D Objects

1 code implementation26 Jul 2021 Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer

We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation.

RMM: A Recursive Mental Model for Dialogue Navigation

1 code implementation Findings of the Association for Computational Linguistics 2020 Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, Jianfeng Gao

In this paper, we go beyond instruction following and introduce a two-agent task where one agent navigates and asks questions that a second, guiding agent answers.

Answer Generation

The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation

no code implementations23 Oct 2020 Shurjo Banerjee, Jesse Thomason, Jason J. Corso

In each trial, the pair first cooperates to localize the robot on a global map visible to the Commander, then the Driver follows Commander instructions to move the robot to a sequence of target objects.

Simultaneous Localization and Mapping

RMM: A Recursive Mental Model for Dialog Navigation

1 code implementation2 May 2020 Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, Jianfeng Gao

In this paper, we go beyond instruction following and introduce a two-agent task where one agent navigates and asks questions that a second, guiding agent answers.

Answer Generation

Experience Grounds Language

2 code implementations EMNLP 2020 Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.

Representation Learning

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

5 code implementations CVPR 2020 Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

Vision-and-Dialog Navigation

2 code implementations10 Jul 2019 Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer

To train agents that search an environment for a goal location, we define the Navigation from Dialog History task.

Visual Navigation

Shifting the Baseline: Single Modality Performance on Visual Navigation \& QA

no code implementations NAACL 2019 Jesse Thomason, Daniel Gordon, Yonatan Bisk

We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research.

Visual Navigation

Improving Robot Success Detection using Static Object Data

1 code implementation2 Apr 2019 Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa

We collect over 13 hours of egocentric manipulation data for training a model to reason about whether a robot successfully placed unseen objects in or on one another.

Interpreting Black Box Models via Hypothesis Testing

1 code implementation29 Mar 2019 Collin Burns, Jesse Thomason, Wesley Tansey

In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments.

Two-sample testing

Prospection: Interpretable Plans From Language By Predicting the Future

no code implementations20 Mar 2019 Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, Dieter Fox

High-level human instructions often correspond to behaviors with multiple implicit steps.

Shifting the Baseline: Single Modality Performance on Visual Navigation & QA

no code implementations1 Nov 2018 Jesse Thomason, Daniel Gordon, Yonatan Bisk

We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research.

Question Answering Visual Navigation

Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing

no code implementations6 Aug 2017 Wesley Tansey, Jesse Thomason, James G. Scott

We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance.

Additive models Denoising

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning

no code implementations WS 2017 Jesse Thomason, Jivko Sinapov, Raymond Mooney

Multi-modal grounded language learning connects language predicates to physical properties of objects in the world.

Grounded language learning

Cannot find the paper you are looking for? You can Submit a new open access paper.