Search Results for author: Yonatan Bisk

We compare the effectiveness of four different syntactic CCG parsers for a semantic slot-filling task to explore how much syntactic supervision is required for downstream semantic analysis.

Semantic Parsing slot-filling +1

120

Paper
Code

Natural Language Inference from Multiple Premises

no code implementations • IJCNLP 2017 • Alice Lai, Yonatan Bisk, Julia Hockenmaier

We define a novel textual entailment task that requires inference over multiple premise sentences.

Natural Language Inference

Paper
Add Code

Synthetic and Natural Noise Both Break Neural Machine Translation

3 code implementations • ICLR 2018 • Yonatan Belinkov, Yonatan Bisk

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems.

Machine Translation NMT +1

4,290

Paper
Code

Learning Interpretable Spatial Operations in a Rich 3D Blocks World

no code implementations • 10 Dec 2017 • Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu

In this paper, we study the problem of mapping natural language instructions to complex spatial actions in a 3D blocks world.

Paper
Add Code

CHALET: Cornell House Agent Learning Environment

2 code implementations • 23 Jan 2018 • Claudia Yan, Dipendra Misra, Andrew Bennnett, Aaron Walsman, Yonatan Bisk, Yoav Artzi

We present CHALET, a 3D house simulator with support for navigation and manipulation.

Paper
Code

Balancing Shared Autonomy with Human-Robot Communication

no code implementations • 20 May 2018 • Rosario Scalise, Yonatan Bisk, Maxwell Forbes, Daqing Yi, Yejin Choi, Siddhartha Srinivasa

Robotic agents that share autonomy with a human should leverage human domain knowledge and account for their preferences when completing a task.

Paper
Add Code

Inducing Grammars with and for Neural Machine Translation

no code implementations • ACL 2018 • Ke Tran, Yonatan Bisk

To address both of these issues we introduce a model that simultaneously translates while inducing dependency trees.

Machine Translation NMT +1

Paper
Add Code

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

1 code implementation • EMNLP 2018 • Rowan Zellers, Yonatan Bisk, Roy Schwartz, Yejin Choi

Given a partial description like "she opened the hood of the car," humans can reason about the situation and anticipate what might come next ("then, she examined the engine").

Ranked #4 on Common Sense Reasoning on SWAG

Common Sense Reasoning Multiple-choice +2

Paper
Code

Shifting the Baseline: Single Modality Performance on Visual Navigation & QA

no code implementations • 1 Nov 2018 • Jesse Thomason, Daniel Gordon, Yonatan Bisk

We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research.

Question Answering Visual Navigation

Paper
Add Code

Early Fusion for Goal Directed Robotic Vision

no code implementations • 21 Nov 2018 • Aaron Walsman, Yonatan Bisk, Saadia Gabriel, Dipendra Misra, Yoav Artzi, Yejin Choi, Dieter Fox

Building perceptual systems for robotics which perform well under tight computational budgets requires novel architectures which rethink the traditional computer vision pipeline.

Imitation Learning Retrieval

Paper
Add Code

From Recognition to Cognition: Visual Commonsense Reasoning

4 code implementations • CVPR 2019 • Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi

While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.

Multiple-choice Multiple Choice Question Answering (MCQA) +1

459

Paper
Code

Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error

no code implementations • 2 Feb 2019 • Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones.

Paper
Add Code

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

1 code implementation • CVPR 2019 • Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.

Ranked #3 on Vision-Language Navigation on Room2Room

Vision and Language Navigation Vision-Language Navigation

Paper
Code

Prospection: Interpretable Plans From Language By Predicting the Future

no code implementations • 20 Mar 2019 • Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, Dieter Fox

High-level human instructions often correspond to behaviors with multiple implicit steps.

Paper
Add Code

Improving Robot Success Detection using Static Object Data

1 code implementation • 2 Apr 2019 • Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa

We collect over 13 hours of egocentric manipulation data for training a model to reason about whether a robot successfully placed unseen objects in or on one another.

Object

Paper
Code

HellaSwag: Can a Machine Really Finish Your Sentence?

2 code implementations • ACL 2019 • Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi

In this paper, we show that commonsense inference still proves difficult for even state-of-the-art models, by presenting HellaSwag, a new challenge dataset.

Ranked #67 on Sentence Completion on HellaSwag

Natural Language Inference Sentence +1

Paper
Code

Defending Against Neural Fake News

4 code implementations • NeurIPS 2019 • Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi

We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data.

Ranked #2 on Fake News Detection on Grover-Mega

Computer Security Fake News Detection +1

906

Paper
Code

Shifting the Baseline: Single Modality Performance on Visual Navigation \& QA

no code implementations • NAACL 2019 • Jesse Thomason, Daniel Gordon, Yonatan Bisk

We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research.

Visual Navigation

Paper
Add Code

Benchmarking Hierarchical Script Knowledge

1 code implementation • NAACL 2019 • Yonatan Bisk, Jan Buys, Karl Pichotta, Yejin Choi

Understanding procedural language requires reasoning about both hierarchical and temporal relations between events.

Benchmarking

Paper
Code

Robust Navigation with Language Pretraining and Stochastic Sampling

1 code implementation • IJCNLP 2019 • Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah Smith, Yejin Choi

Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments.

Vision and Language Navigation

Paper
Code

PIQA: Reasoning about Physical Commonsense in Natural Language

2 code implementations • 26 Nov 2019 • Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, Yejin Choi

Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems.

Ranked #36 on Question Answering on PIQA

Natural Language Understanding Physical Commonsense Reasoning +1

Paper
Code

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

7 code implementations • CVPR 2020 • Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Natural Language Visual Grounding

331

Paper
Code

Multi-View Learning for Vision-and-Language Navigation

no code implementations • 2 Mar 2020 • Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Jianfeng Gao, Yejin Choi, Noah A. Smith

Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified.

MULTI-VIEW LEARNING Navigate +1

Paper
Add Code

Experience Grounds Language

2 code implementations • EMNLP 2020 • Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.

Representation Learning

Paper
Code

RMM: A Recursive Mental Model for Dialog Navigation

1 code implementation • 2 May 2020 • Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, Jianfeng Gao

In this paper, we go beyond instruction following and introduce a two-agent task where one agent navigates and asks questions that a second, guiding agent answers.

Answer Generation Instruction Following

Paper
Code

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

1 code implementation • EMNLP (nlpbt) 2020 • Frank F. Xu, Lei Ji, Botian Shi, Junyi Du, Graham Neubig, Yonatan Bisk, Nan Duan

Watching instructional videos are often used to learn about procedures.

Action Detection Semantic Role Labeling +2

Paper
Code

The Return of Lexical Dependencies: Neural Lexicalized PCFGs

3 code implementations • 29 Jul 2020 • Hao Zhu, Yonatan Bisk, Graham Neubig

In this paper we demonstrate that $\textit{context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies}$.

Paper
Code

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

1 code implementation • 8 Oct 2020 • Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

Natural Language Visual Grounding Scene Understanding

247

Paper
Code

RMM: A Recursive Mental Model for Dialogue Navigation

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, Jianfeng Gao

In this paper, we go beyond instruction following and introduce a two-agent task where one agent navigates and asks questions that a second, guiding agent answers.

Answer Generation Instruction Following

Paper
Code

Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games

no code implementations • COLING 2020 • Alessandro Suglia, Antonio Vergari, Ioannis Konstas, Yonatan Bisk, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, relying instead on gold category labels for objects in the scene both at training and inference time.

Object

Paper
Add Code

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

1 code implementation • 7 Nov 2020 • Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari

Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models.

Language Modelling Question Answering

Paper
Code

BUTLER: Building Understanding in TextWorld via Language for Embodied Reasoning

no code implementations • ICLR 2021 • Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cote, Yonatan Bisk, Adam Trischler, Matthew Hausknecht

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

Scene Understanding

Paper
Add Code

Token-Level Contrast for Video and Language Alignment

no code implementations • 1 Jan 2021 • Jianwei Yang, Yonatan Bisk, Jianfeng Gao

Building video and language understanding models requires grounding linguistic concepts and video contents into a shared space.

Paper
Add Code

An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games

no code implementations • EACL 2021 • Alessandro Suglia, Yonatan Bisk, Ioannis Konstas, Antonio Vergari, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

Guessing games are a prototypical instance of the "learning by interacting" paradigm.

Question Answering Visual Question Answering

Paper
Add Code

Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models

no code implementations • NAACL (GeBNLP) 2022 • Tejas Srinivasan, Yonatan Bisk

Numerous works have analyzed biases in vision and pre-trained language models individually - however, less attention has been paid to how these biases interact in multimodal settings.

Paper
Add Code

Grounding 'Grounding' in NLP

no code implementations • 4 Jun 2021 • Khyathi Raghavi Chandu, Yonatan Bisk, Alan W Black

And finally, (3) How to advance our current definition to bridge the gap with Cognitive Science?

Paper
Add Code

Few-shot Language Coordination by Modeling Theory of Mind

no code implementations • 12 Jul 2021 • Hao Zhu, Graham Neubig, Yonatan Bisk

Positive results from our experiments hint at the importance of explicitly modeling communication as a socio-pragmatic progress.

Paper
Add Code

Language Grounding with 3D Objects

2 code implementations • 26 Jul 2021 • Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer

We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation.

Paper
Code

TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment

no code implementations • ICCV 2021 • Jianwei Yang, Yonatan Bisk, Jianfeng Gao

This is motivated by the observation that for a video-text pair, the content words in the text, such as nouns and verbs, are more likely to be aligned with the visual contents in the video than the function words.

Ranked #3 on Temporal Action Localization on CrossTask (using extra training data)

Action Segmentation Contrastive Learning +5

Paper
Add Code

WebQA: Multihop and Multimodal QA

2 code implementations • CVPR 2022 • Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, Yonatan Bisk

Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation.

Image Retrieval Multimodal Reasoning +4

Paper
Code

Dependency Induction Through the Lens of Visual Perception

1 code implementation • CoNLL (EMNLP) 2021 • Ruisi Su, Shruti Rijhwani, Hao Zhu, Junxian He, Xinyu Wang, Yonatan Bisk, Graham Neubig

Our experiments find that concreteness is a strong indicator for learning dependency grammars, improving the direct attachment score (DAS) by over 50\% as compared to state-of-the-art models trained on pure text.

Constituency Grammar Induction Dependency Parsing

Paper
Code

Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

no code implementations • 29 Sep 2021 • Khanh Xuan Nguyen, Yonatan Bisk, Hal Daumé III

Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7× improvement in task success rate compared to performing tasks only by itself.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Shaped Rewards Bias Emergent Language

no code implementations • 29 Sep 2021 • Brendon Boldt, Yonatan Bisk, David R Mortensen

The second is shaped rewards which are designed specifically to make the task easier to learn by introducing biases in the learning process.

Inductive Bias

Paper
Add Code

Symmetric Machine Theory of Mind

no code implementations • 29 Sep 2021 • Melanie Sclar, Graham Neubig, Yonatan Bisk

Theory of mind (ToM), the ability to understand others' thoughts and desires, is a cornerstone of human intelligence.

Paper
Add Code

FILM: Following Instructions in Language with Modular Methods

1 code implementation • ICLR 2022 • So Yeon Min, Devendra Singh Chaplot, Pradeep Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov

In contrast, we propose a modular method with structured representations that (1) builds a semantic map of the scene and (2) performs exploration with a semantic search policy, to achieve the natural language goal.

Imitation Learning Instruction Following

107

Paper
Code

A Framework for Learning to Request Rich and Contextually Useful Information from Humans

no code implementations • 14 Oct 2021 • Khanh Nguyen, Yonatan Bisk, Hal Daumé III

We show that the agent can take advantage of different types of information depending on the context, and analyze the benefits and challenges of learning the assistance-requesting policy when the assistant can recursively decompose tasks into subtasks.

Decision Making Hierarchical Reinforcement Learning

Paper
Add Code

KAT: A Knowledge Augmented Transformer for Vision-and-Language

1 code implementation • NAACL 2022 • Liangke Gui, Borui Wang, Qiuyuan Huang, Alex Hauptmann, Yonatan Bisk, Jianfeng Gao

The primary focus of recent work with largescale transformers has been on optimizing the amount of information packed into the model's parameters.

Answer Generation Retrieval +1

Paper
Code

HEAR: Holistic Evaluation of Audio Representations

3 code implementations • 6 Mar 2022 • Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.

Open-Ended Question Answering

Paper
Code

Training Vision-Language Transformers from Captions

1 code implementation • 19 May 2022 • Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alex Hauptmann, Jianfeng Gao, Yonatan Bisk

Vision-Language Transformers can be learned without low-level human labels (e. g. class labels, bounding boxes, etc).

Paper
Code

On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization

no code implementations • 24 May 2022 • Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasović

Combining the visual modality with pretrained language models has been surprisingly effective for simple descriptive tasks such as image captioning.

Descriptive Image Captioning +5

Paper
Add Code

Transformers are Adaptable Task Planners

no code implementations • 6 Jul 2022 • Vidhi Jain, Yixin Lin, Eric Undersander, Yonatan Bisk, Akshara Rai

Every home is different, and every person likes things done in their particular way.

Attribute

Paper
Add Code

Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue

1 code implementation • 10 Oct 2022 • So Yeon Min, Hao Zhu, Ruslan Salakhutdinov, Yonatan Bisk

We provide empirical comparisons of metrics, analysis of three models, and make suggestions for how the field might best progress.

Imitation Learning Instruction Following

Paper
Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

EvEntS ReaLM: Event Reasoning of Entity States via Language Models

no code implementations • 10 Nov 2022 • Evangelia Spiliopoulou, Artidoro Pagnoni, Yonatan Bisk, Eduard Hovy

This paper investigates models of event implications.

Benchmarking

Paper
Add Code

Self-Supervised Object Goal Navigation with In-Situ Finetuning

no code implementations • 9 Dec 2022 • So Yeon Min, Yao-Hung Hubert Tsai, Wei Ding, Ali Farhadi, Ruslan Salakhutdinov, Yonatan Bisk, Jian Zhang

In contrast, our LocCon shows the most robust transfer in the real world among the set of models we compare to, and that the real-world performance of all models can be further improved with self-supervised LocCon in-situ training.

Contrastive Learning Navigate +2

Paper
Add Code

EXCALIBUR: Encouraging and Evaluating Embodied Exploration

no code implementations • CVPR 2023 • Hao Zhu, Raghav Kapoor, So Yeon Min, Winson Han, Jiatai Li, Kaiwen Geng, Graham Neubig, Yonatan Bisk, Aniruddha Kembhavi, Luca Weihs

Humans constantly explore and learn about their environment out of curiosity, gather information, and update their models of the world.

Paper
Add Code

The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment

1 code implementation • 13 Feb 2023 • Jared Fernandez, Jacob Kahn, Clara Na, Yonatan Bisk, Emma Strubell

In this work, we examine this phenomenon through a series of case studies analyzing the effects of model design decisions, framework paradigms, and hardware platforms on total model latency.

Computational Efficiency

Paper
Code

Computational Language Acquisition with Theory of Mind

1 code implementation • 2 Mar 2023 • Andy Liu, Hao Zhu, Emmy Liu, Yonatan Bisk, Graham Neubig

We also find some evidence that increasing task difficulty in the training process results in more fluent and precise utterances in evaluation.

Language Acquisition

Paper
Code

Spatial-Language Attention Policies for Efficient Robot Learning

no code implementations • 21 Apr 2023 • Priyam Parashar, Vidhi Jain, Xiaohan Zhang, Jay Vakil, Sam Powers, Yonatan Bisk, Chris Paxton

We see a 4x improvement over baseline in mobile manipulation setting.

Decision Making Language Modelling +1

Paper
Add Code

Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

no code implementations • 3 May 2023 • Yue Wu, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Yuanzhi Li, Tom Mitchell, Shrimai Prabhumoye

We propose the Plan, Eliminate, and Track (PET) framework.

Instruction Following

Paper
Add Code

SPRING: Studying the Paper and Reasoning to Play Games

1 code implementation • 24 May 2023 • Yue Wu, Shrimai Prabhumoye, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Tom Mitchell, Yuanzhi Li

Finally, we show the potential of games as a test bed for LLMs.

Language Modelling Large Language Model +1

Paper
Code

HomeRobot: Open-Vocabulary Mobile Manipulation

no code implementations • 20 Jun 2023 • Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks.

Paper
Add Code

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations • NeurIPS 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

Paper
Add Code

WebArena: A Realistic Web Environment for Building Autonomous Agents

1 code implementation • 25 Jul 2023 • Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig

Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.

532

Paper
Code

MAEA: Multimodal Attribution for Embodied AI

no code implementations • 25 Jul 2023 • Vidhi Jain, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Yonatan Bisk

Understanding multimodal perception for embodied AI is an open question because such inputs may contain highly complementary as well as redundant information for the task.

Paper
Add Code

Reasoning about the Unseen for Efficient Outdoor Object Navigation

1 code implementation • 18 Sep 2023 • Quanting Xie, Tianyi Zhang, Kedi Xu, Matthew Johnson-Roberson, Yonatan Bisk

We introduce a new task OUTDOOR, a new mechanism for Large Language Models (LLMs) to accurately hallucinate possible futures, and a new computationally aware success metric for pushing research forward in this more complex domain.

Navigate Object

Paper
Code

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

1 code implementation • 18 Oct 2023 • Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap

We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence.

Paper
Code

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

no code implementations • 14 Dec 2023 • Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk

Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i. e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like.

Paper
Add Code

VISREAS: Complex Visual Reasoning with Unanswerable Questions

no code implementations • 23 Feb 2024 • Syeda Nahida Akter, Sangwu Lee, Yingshan Chang, Yonatan Bisk, Eric Nyberg

The unique feature of this task, validating question answerability with respect to an image before answering, and the poor performance of state-of-the-art models inspired the design of a new modular baseline, LOGIC2VISION that reasons by producing and executing pseudocode without any external modules to generate the answer.

Question Answering Visual Question Answering +1

Paper
Add Code

SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents

1 code implementation • 13 Mar 2024 • Ruiyi Wang, Haofei Yu, Wenxin Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu

Motivated by this gap, we propose an interactive learning method, SOTOPIA-$\pi$, improving the social intelligence of language agents.

Language Modelling Large Language Model

Paper
Code

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

no code implementations • 19 Mar 2024 • Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi

Given a video demonstration of a manipulation task and current visual observations, Vid2Robot directly produces robot actions.

Paper
Add Code

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

no code implementations • 25 Mar 2024 • Yingshan Chang, Yasi Zhang, Zhiyuan Fang, YingNian Wu, Yonatan Bisk, Feng Gao

We hypothesize that the underlying phenomenological coverage has not been proportionally scaled up, leading to a skew of the presented phenomenon which harms generalization.

Relational Reasoning Text-to-Image Generation

Paper
Add Code

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

1 code implementation • 1 Apr 2024 • Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM).

Instruction Following Language Modelling +3

Paper
Code

Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

no code implementations • 1 Apr 2024 • Casey Kennington, Malihe Alikhani, Heather Pon-Barry, Katherine Atwell, Yonatan Bisk, Daniel Fried, Felix Gervits, Zhao Han, Mert Inan, Michael Johnston, Raj Korpan, Diane Litman, Matthew Marge, Cynthia Matuszek, Ross Mead, Shiwali Mohan, Raymond Mooney, Natalie Parde, Jivko Sinapov, Angela Stewart, Matthew Stone, Stefanie Tellex, Tom Williams

The ability to interact with machines using natural human language is becoming not just commonplace, but expected.

Paper
Add Code

AgentKit: Flow Engineering with Graphs, not Coding

1 code implementation • 17 Apr 2024 • Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen Mcaleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".

Paper
Code

Grounding ‘Grounding’ in NLP

no code implementations • Findings (ACL) 2021 • Khyathi Raghavi Chandu, Yonatan Bisk, Alan W Black

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.