Search Results for author: Qiaozi Gao

Found 24 papers, 9 papers with code

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

no code implementations26 Feb 2024 Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens.

 Ranked #1 on Generalized Referring Expression Segmentation on gRefCOCO (using extra training data)

Causal Language Modeling Generalized Referring Expression Segmentation +2

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

no code implementations7 Aug 2023 Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme

Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions.

Offline RL reinforcement-learning +1

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

no code implementations2 Aug 2023 Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai, Gaurav S. Sukhatme

We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting.

LEMMA Robot Manipulation

Language-Informed Transfer Learning for Embodied Household Activities

no code implementations12 Jan 2023 Yuqian Jiang, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme

For service robots to become general-purpose in everyday household environments, they need not only a large library of primitive skills, but also the ability to quickly learn novel tasks specified by users.

Semantic Similarity Semantic Textual Similarity +1

OpenD: A Benchmark for Language-Driven Door and Drawer Opening

no code implementations10 Dec 2022 Yizhou Zhao, Qiaozi Gao, Liang Qiu, Govind Thattai, Gaurav S. Sukhatme

We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction.

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

2 code implementations27 Feb 2022 Xiaofeng Gao, Qiaozi Gao, Ran Gong, Kaixiang Lin, Govind Thattai, Gaurav S. Sukhatme

Language-guided Embodied AI benchmarks requiring an agent to navigate an environment and manipulate objects typically allow one-way communication: the human user gives a natural language command to the agent, and the agent can only follow the command passively.

Instruction Following Navigate

Learning to Act with Affordance-Aware Multimodal Neural SLAM

1 code implementation24 Jan 2022 Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme

With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23. 48% on the test unseen scenes.

Efficient Exploration Test unseen

Best of Both Worlds: A Hybrid Approach for Multi-Hop Explanation with Declarative Facts

no code implementations AAAI Workshop CLeaR 2022 Shane Storks, Qiaozi Gao, Aishwarya Reganti, Govind Thattai

Language-enabled AI systems can answer complex, multi-hop questions to high accuracy, but supporting answers with evidence is a more challenging task which is important for the transparency and trustworthiness to users.

Explanation Generation Retrieval

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

1 code implementation10 Nov 2021 Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme

However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.

Indoor Scene Synthesis Scene Generation

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

1 code implementation Findings (EMNLP) 2021 Shane Storks, Qiaozi Gao, Yichi Zhang, Joyce Chai

However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning.

valid

Are We There Yet? Learning to Localize in Embodied Instruction Following

no code implementations9 Jan 2021 Shane Storks, Qiaozi Gao, Govind Thattai, Gokhan Tur

Embodied instruction following is a challenging problem requiring an agent to infer a sequence of primitive actions to achieve a goal environment state from complex language and visual inputs.

Instruction Following object-detection +1

Interactive Teaching for Conversational AI

no code implementations2 Dec 2020 Qing Ping, Feiyang Niu, Govind Thattai, Joel Chengottusseriyil, Qiaozi Gao, Aishwarya Reganti, Prashanth Rajagopal, Gokhan Tur, Dilek Hakkani-Tur, Prem Nataraja

Current conversational AI systems aim to understand a set of pre-designed requests and execute related actions, which limits them to evolve naturally and adapt based on human interactions.

Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches

3 code implementations2 Apr 2019 Shane Storks, Qiaozi Gao, Joyce Y. Chai

In the NLP community, recent years have seen a surge of research activities that address machines' ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge of the world.

Natural Language Inference

Commonsense Justification for Action Explanation

1 code implementation EMNLP 2018 Shaohua Yang, Qiaozi Gao, Sari Sadiya, Joyce Chai

To enable collaboration and communication between humans and agents, this paper investigates learning to acquire commonsense evidence for action justification.

Decision Making

What Action Causes This? Towards Naive Physical Action-Effect Prediction

no code implementations ACL 2018 Qiaozi Gao, Shaohua Yang, Joyce Chai, V, Lucy erwende

Despite recent advances in knowledge representation, automated reasoning, and machine learning, artificial agents still lack the ability to understand basic action-effect relations regarding the physical world, for example, the action of cutting a cucumber most likely leads to the state where the cucumber is broken apart into smaller pieces.

Interactive Learning of State Representation through Natural Language Instruction and Explanation

no code implementations7 Oct 2017 Qiaozi Gao, Lanbo She, Joyce Y. Chai

One significant simplification in most previous work on robot learning is the closed-world assumption where the robot is assumed to know ahead of time a complete set of predicates describing the state of the physical world.

Cannot find the paper you are looking for? You can Submit a new open access paper.