Search Results for author: Joyce Chai

Found 52 papers, 36 papers with code

Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use

1 code implementation31 Oct 2024 Jiajun Xi, Yinong He, Jianing Yang, Yinpei Dai, Joyce Chai

In real-world scenarios, it is desirable for embodied agents to have the ability to leverage human language to gain explicit or implicit knowledge for learning tasks.

Diversity Informativeness +1

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities

1 code implementation22 Oct 2024 Zheyuan Zhang, Fengyuan Hu, Jayjun Lee, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma

Spatial expressions in situated communication can be ambiguous, as their meanings vary depending on the frames of reference (FoR) adopted by speakers and listeners.

Spatial Reasoning

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

no code implementations23 Sep 2024 Yinpei Dai, Jayjun Lee, Nima Fazeli, Joyce Chai

Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions.

Imitation Learning Language Modelling

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

1 code implementation9 Jul 2024 Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development.

Vision and Language Navigation

Multi-Object Hallucination in Vision-Language Models

1 code implementation8 Jul 2024 Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images.

Hallucination Object Hallucination

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

1 code implementation7 Jun 2024 Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai

The integration of language and 3D perception is crucial for developing embodied agents and robots that comprehend and interact with the physical world.

Hallucination

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

1 code implementation5 Jun 2024 Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai

Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments.

Autonomous Driving Language Modelling +2

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

1 code implementation22 May 2024 Ziqiao Ma, Zekun Wang, Joyce Chai

In this work, we aim to examine how corrective feedback from interactions influences neural language acquisition from the ground up through systematically controlled experiments, assessing whether it contributes to learning efficiency in language models.

Language Acquisition Language Modelling

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

no code implementations CVPR 2024 Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens.

Causal Language Modeling Generalized Referring Expression Segmentation +3

Inversion-Free Image Editing with Language-Guided Diffusion Models

1 code implementation CVPR 2024 Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma, Joyce Chai

Despite recent advances in inversion-based editing text-guided image manipulation remains challenging for diffusion models.

Denoising Image Manipulation

Inversion-Free Image Editing with Natural Language

1 code implementation7 Dec 2023 Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma, Joyce Chai

We show that when the initial sample is known, a special variance schedule reduces the denoising step to the same form as the multi-step consistency sampling.

Image Manipulation Text-based Image Editing

Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties

1 code implementation28 Nov 2023 Keunwoo Peter Yu, Zheyuan Zhang, Fengyuan Hu, Shane Storks, Joyce Chai

Our results, analysis, and \eilev{}-trained models yield numerous insights about the emergence of in-context learning over video and text, creating a foundation for future work to optimize and scale VLMs for open-domain video understanding and reasoning.

In-Context Learning Video Understanding

GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning

1 code implementation9 Nov 2023 Guangyue Xu, Joyce Chai, Parisa Kordjamshidi

In this work, we propose GIP-COL (Graph-Injected Soft Prompting for COmpositional Learning) to better explore the compositional zero-shot learning (CZSL) ability of VLMs within the prompt-based learning framework.

Attribute Compositional Zero-Shot Learning

MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition

no code implementations2 Nov 2023 Guangyue Xu, Parisa Kordjamshidi, Joyce Chai

Inspired by this observation, in this paper, we propose MetaReVision, a retrieval-enhanced meta-learning model to address the visually grounded compositional concept learning problem.

Meta-Learning Retrieval

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

1 code implementation1 Nov 2023 Yuwei Bao, Keunwoo Peter Yu, Yichi Zhang, Shane Storks, Itamar Bar-Yossef, Alexander De La Iglesia, Megan Su, Xiao Lin Zheng, Joyce Chai

Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks.

Decision Making

Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

1 code implementation31 Oct 2023 Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai

Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world.

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

1 code implementation30 Oct 2023 Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai

Such situated evaluation provides a more comprehensive assessment of mental states and potentially mitigates the risk of shortcuts and data leakage.

Position Theory of Mind Modeling

From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning

1 code implementation24 Oct 2023 Zheyuan Zhang, Shane Storks, Fengyuan Hu, Sungryull Sohn, Moontae Lee, Honglak Lee, Joyce Chai

We incorporate these interlinked dual processes in fine-tuning and in-context learning with PLMs, applying them to two language understanding tasks that require coherent physical commonsense reasoning.

In-Context Learning Physical Commonsense Reasoning

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

1 code implementation NeurIPS 2023 Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai

Our empirical studies show that Cyclenet is superior in translation consistency and quality, and can generate high-quality images for out-of-domain distributions with a simple change of the textual prompt.

2k Image Manipulation +1

Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation

1 code implementation12 Oct 2023 Yinpei Dai, Run Peng, Sikai Li, Joyce Chai

To address these limitations, we introduce Zero-shot Interactive Personalized Object Navigation (ZIPON), where robots need to navigate to personalized goal objects while engaging in conversations with users.

Navigate Object +1

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

1 code implementation21 Sep 2023 Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai

While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline.

3D visual grounding Language Modelling +3

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

1 code implementation14 Jun 2023 Ziqiao Ma, Jiayi Pan, Joyce Chai

The ability to connect language units to their referents in the physical world, referred to as grounding, is crucial to learning and understanding grounded meanings of words.

Grounded Open Vocabulary Acquisition Language Modelling

In-Context Analogical Reasoning with Pre-Trained Language Models

1 code implementation28 May 2023 Xiaoyang Hu, Shane Storks, Richard L. Lewis, Joyce Chai

Analogical reasoning is a fundamental capacity of human cognition that allows us to reason abstractly about novel situations by relating them to past experiences.

In-Context Learning Relational Reasoning

NLP Reproducibility For All: Understanding Experiences of Beginners

3 code implementations26 May 2023 Shane Storks, Keunwoo Peter Yu, Ziqiao Ma, Joyce Chai

As natural language processing (NLP) has recently seen an unprecedented level of excitement, and more people are eager to enter the field, it is unclear whether current research reproducibility efforts are sufficient for this group of beginners to apply the latest developments.

Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue

1 code implementation18 May 2023 Cristian-Paul Bara, Ziqiao Ma, Yingzhuo Yu, Julie Shah, Joyce Chai

To complete these tasks, agents need to engage in situated communication with their partners and coordinate their partial plans towards a complete plan to achieve a joint task goal.

Collaborative Plan Acquisition Theory of Mind Modeling

BAD: BiAs Detection for Large Language Models in the context of candidate screening

1 code implementation17 May 2023 Nam Ho Koh, Joseph Plata, Joyce Chai

Application Tracking Systems (ATS) have allowed talent managers, recruiters, and college admissions committees to process large volumes of potential candidate applications efficiently.

Bias Detection Fairness

Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

no code implementations9 Nov 2022 Guangyue Xu, Parisa Kordjamshidi, Joyce Chai

This work explores the zero-shot compositional learning ability of large pre-trained vision-language models(VLMs) within the prompt-based learning framework and propose a model (\textit{PromptCompVL}) to solve the compositonal zero-shot learning (CZSL) problem.

Zero-Shot Learning

DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents

1 code implementation22 Oct 2022 Ziqiao Ma, Ben VanDerPloeg, Cristian-Paul Bara, Huang Yidong, Eui-In Kim, Felix Gervits, Matthew Marge, Joyce Chai

To this end, we introduce Dialogue On the ROad To Handle Irregular Events (DOROTHIE), a novel interactive simulation platform that enables the creation of unexpected situations on the fly to support empirical studies on situated communication with autonomous driving agents.

Autonomous Driving Dialogue Act Classification +2

Reproducibility Beyond the Research Community: Experience from NLP Beginners

no code implementations4 May 2022 Shane Storks, Keunwoo Peter Yu, Joyce Chai

As NLP research attracts public attention and excitement, it becomes increasingly important for it to be accessible to a broad audience.

Learning to Mediate Disparities Towards Pragmatic Communication

1 code implementation ACL 2022 Yuwei Bao, Sayan Ghosh, Joyce Chai

The PRS attempts to learn the speaker-listener disparity and adjust the speech accordingly, by adding a light-weighted disparity adjustment layer into working memory on top of speaker's long-term memory system.

Partition-Based Active Learning for Graph Neural Networks

1 code implementation23 Jan 2022 Jiaqi Ma, Ziqiao Ma, Joyce Chai, Qiaozhu Mei

We study the problem of semi-supervised learning with Graph Neural Networks (GNNs) in an active learning setup.

Active Learning Node Classification

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

1 code implementation Findings (EMNLP) 2021 Shane Storks, Joyce Chai

As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities.

text-classification Text Classification

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

1 code implementation Findings (EMNLP) 2021 Shane Storks, Qiaozi Gao, Yichi Zhang, Joyce Chai

However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning.

valid

Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring

1 code implementation Findings (ACL) 2021 Yichi Zhang, Joyce Chai

On the ALFRED benchmark for task learning, the published state-of-the-art system only achieves a task success rate of less than 10% in an unseen environment, compared to the human performance of over 90%.

Experience Grounds Language

no code implementations EMNLP 2020 Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.

Representation Learning

Commonsense Justification for Action Explanation

1 code implementation EMNLP 2018 Shaohua Yang, Qiaozi Gao, Sari Sadiya, Joyce Chai

To enable collaboration and communication between humans and agents, this paper investigates learning to acquire commonsense evidence for action justification.

Decision Making

What Action Causes This? Towards Naive Physical Action-Effect Prediction

no code implementations ACL 2018 Qiaozi Gao, Shaohua Yang, Joyce Chai, V, Lucy erwende

Despite recent advances in knowledge representation, automated reasoning, and machine learning, artificial agents still lack the ability to understand basic action-effect relations regarding the physical world, for example, the action of cutting a cucumber most likely leads to the state where the cucumber is broken apart into smaller pieces.

Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication

no code implementations ACL 2017 Lanbo She, Joyce Chai

To enable human-robot communication and collaboration, previous works represent grounded verb semantics as the potential change of state to the physical world caused by these verbs.

Reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.