no code implementations • 25 Mar 2024 • Ishika Singh, David Traum, Jesse Thomason
We demonstrate that LLM-based goal decomposition leads to faster planning times than solving multi-agent PDDL problems directly while simultaneously achieving fewer plan execution steps than a single agent plan alone and preserving execution success.
no code implementations • 16 Mar 2024 • Anthony Liang, Jesse Thomason, Erdem Biyik
Using ViSaRL to learn visual representations significantly improves the success rate, sample efficiency, and generalization of an RL agent on diverse tasks including DeepMind Control benchmark, robot manipulation in simulation and on a real robot.
no code implementations • 23 Feb 2024 • Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu
Prior work on selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain.
no code implementations • 21 Feb 2024 • Woojeong Jin, Tejas Srinivasan, Jesse Thomason, Xiang Ren
We present WinoViz, a text-only evaluation dataset, consisting of 1, 380 examples that probe the reasoning abilities of language models regarding variant visual properties of objects under different contexts or states.
1 code implementation • 13 Feb 2024 • Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, Dieter Fox
To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions.
Ranked #1 on Robot Manipulation Generalization on The COLOSSEUM
no code implementations • 28 Nov 2023 • Wang Zhu, Ishika Singh, Yuan Huang, Robin Jia, Jesse Thomason
Data augmentation via back-translation is common when pretraining Vision-and-Language Navigation (VLN) models, even though the generated instructions are noisy.
no code implementations • 16 Nov 2023 • Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova
Understanding visually situated language requires recognizing text and visual elements, and interpreting complex layouts.
no code implementations • 15 Nov 2023 • Ting-Yun Chang, Jesse Thomason, Robin Jia
Large language models (LLMs) can memorize many pretrained sequences verbatim.
no code implementations • 12 Nov 2023 • Chancharik Mitra, Abrar Anwar, Rodolfo Corona, Dan Klein, Trevor Darrell, Jesse Thomason
In this work, we consider the task of resolving object referents when given a comparative language description.
1 code implementation • 30 Sep 2023 • Lee Kezar, Riley Carlin, Tejas Srinivasan, Zed Sehyr, Naomi Caselli, Jesse Thomason
Specifically, we explore how learning strategies like multi-task and curriculum learning can leverage mutually useful information between phoneme types to facilitate better modeling of sign language phonemes.
1 code implementation • 30 Sep 2023 • Lee Kezar, Elana Pontecorvo, Adele Daniels, Connor Baer, Ruth Ferster, Lauren Berger, Jesse Thomason, Zed Sevcikova Sehyr, Naomi Caselli
Sign language recognition and translation technologies have the potential to increase access and inclusion of deaf signing communities, but research progress is bottlenecked by a lack of representative data.
no code implementations • 24 May 2023 • Wang Zhu, Jesse Thomason, Robin Jia
We train a language model (LM) to robustly answer multistep questions by generating and answering sub-questions.
1 code implementation • 4 Apr 2023 • Tejas Srinivasan, Furong Jia, Mohammad Rostami, Jesse Thomason
We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapters.
no code implementations • 25 Mar 2023 • Yuliang Cai, Jesse Thomason, Mohammad Rostami
The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications.
1 code implementation • 27 Feb 2023 • Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, Jesse Thomason
Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 11 Feb 2023 • Lee Kezar, Jesse Thomason, Zed Sevcikova Sehyr
We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding.
no code implementations • 30 Jan 2023 • Gunnar A. Sigurdsson, Jesse Thomason, Gaurav S. Sukhatme, Robinson Piramuthu
Armed with this intuition, using only a generic vision-language scoring model with minor modifications for 3d encoding and operating in an embodied environment, we demonstrate an absolute performance gain of 9. 84% on remote object grounding above state of the art models for REVERIE and of 5. 04% on FAO.
no code implementations • 30 Nov 2022 • Vishnu Sashank Dorbala, Gunnar Sigurdsson, Robinson Piramuthu, Jesse Thomason, Gaurav S. Sukhatme
Our results on the coarse-grained instruction following task of REVERIE demonstrate the navigational capability of CLIP, surpassing the supervised baseline in terms of both success rate (SR) and success weighted by path length (SPL).
no code implementations • 26 Oct 2022 • Wang Zhu, Jesse Thomason, Robin Jia
For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance.
no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu
We present a retrospective on the state of Embodied AI research.
no code implementations • CVPR 2023 • Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason
We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time.
no code implementations • 22 Sep 2022 • Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, Animesh Garg
To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information.
1 code implementation • 18 Aug 2022 • Georgios Chochlakis, Tejas Srinivasan, Jesse Thomason, Shrikanth Narayanan
VAuLT is an extension of the popular Vision-and-Language Transformer (ViLT), and improves performance on vision-and-language (VL) tasks that involve more complex text inputs than image captions while having minimal impact on training and inference efficiency.
no code implementations • 29 Jul 2022 • Tejas Srinivasan, Xiang Ren, Jesse Thomason
Aligning image and text encoders from scratch using contrastive learning requires large amounts of paired image-text data.
no code implementations • 1 Jul 2022 • Sara Mohammadinejad, Jesse Thomason, Jyotirmoy V. Deshmukh
In this work, we propose DIALOGUESTL, an interactive approach for learning correct and concise STL formulas from (often) ambiguous NL descriptions.
1 code implementation • 18 Jun 2022 • Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios Chochlakis, Mohammad Rostami, Jesse Thomason
Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks.
1 code implementation • ACL 2022 • Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang
A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks.
1 code implementation • 10 Nov 2021 • Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme
However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.
3 code implementations • 1 Oct 2021 • Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur
Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes.
1 code implementation • 10 Aug 2021 • Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme
Language-guided robots performing home and office tasks must navigate in and interact with the world.
2 code implementations • 26 Jul 2021 • Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer
We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, Jianfeng Gao
In this paper, we go beyond instruction following and introduce a two-agent task where one agent navigates and asks questions that a second, guiding agent answers.
no code implementations • 23 Oct 2020 • Shurjo Banerjee, Jesse Thomason, Jason J. Corso
In each trial, the pair first cooperates to localize the robot on a global map visible to the Commander, then the Driver follows Commander instructions to move the robot to a sequence of target objects.
1 code implementation • 2 May 2020 • Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, Jianfeng Gao
In this paper, we go beyond instruction following and introduce a two-agent task where one agent navigates and asks questions that a second, guiding agent answers.
2 code implementations • EMNLP 2020 • Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian
Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.
7 code implementations • CVPR 2020 • Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox
We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.
2 code implementations • 10 Jul 2019 • Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer
To train agents that search an environment for a goal location, we define the Navigation from Dialog History task.
no code implementations • NAACL 2019 • Jesse Thomason, Daniel Gordon, Yonatan Bisk
We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research.
1 code implementation • 2 Apr 2019 • Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa
We collect over 13 hours of egocentric manipulation data for training a model to reason about whether a robot successfully placed unseen objects in or on one another.
1 code implementation • 29 Mar 2019 • Collin Burns, Jesse Thomason, Wesley Tansey
In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments.
no code implementations • 20 Mar 2019 • Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, Dieter Fox
High-level human instructions often correspond to behaviors with multiple implicit steps.
1 code implementation • 1 Mar 2019 • Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney
Natural language understanding for robotics can require substantial domain- and platform-specific engineering.
no code implementations • 1 Nov 2018 • Jesse Thomason, Daniel Gordon, Yonatan Bisk
We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research.
no code implementations • IJCNLP 2017 • Rodolfo Corona, Jesse Thomason, Raymond Mooney
Speech is a natural channel for human-computer interaction in robotics and consumer applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 6 Aug 2017 • Wesley Tansey, Jesse Thomason, James G. Scott
We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance.
no code implementations • WS 2017 • Jesse Thomason, Jivko Sinapov, Raymond Mooney
Multi-modal grounded language learning connects language predicates to physical properties of objects in the world.
no code implementations • EACL 2017 • Aishwarya Padmakumar, Jesse Thomason, Raymond J. Mooney
Natural language understanding and dialog management are two integral components of interactive dialog systems.