no code implementations • CVPR 2024 • Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai
Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens.
Ranked #2 on Referring Expression Segmentation on PhraseCut
Causal Language Modeling Generalized Referring Expression Segmentation +4
no code implementations • 14 Oct 2023 • Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang
In this work, we tackle the problem of training a robot to understand multimodal prompts, interleaving vision signals with text descriptions.
no code implementations • 9 Aug 2023 • Hangjie Shi, Leslie Ball, Govind Thattai, Desheng Zhang, Lucy Hu, Qiaozi Gao, Suhaila Shakiah, Xiaofeng Gao, Aishwarya Padmakumar, Bofei Yang, Cadence Chung, Dinakar Guthy, Gaurav Sukhatme, Karthika Arumugam, Matthew Wen, Osman Ipek, Patrick Lange, Rohan Khanna, Shreyas Pansare, Vasu Sharma, Chao Zhang, Cris Flagg, Daniel Pressel, Lavina Vaz, Luke Dai, Prasoon Goyal, Sattvik Sahai, Shaohua Liu, Yao Lu, Anna Gottardi, Shui Hu, Yang Liu, Dilek Hakkani-Tur, Kate Bland, Heather Rocker, James Jeun, Yadunandana Rao, Michael Johnston, Akshaya Iyengar, Arindam Mandal, Prem Natarajan, Reza Ghanadan
The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge.
no code implementations • 7 Aug 2023 • Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme
Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions.
no code implementations • 2 Aug 2023 • Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai, Gaurav S. Sukhatme
We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting.
1 code implementation • NeurIPS 2023 • Qiaozi Gao, Govind Thattai, Suhaila Shakiah, Xiaofeng Gao, Shreyas Pansare, Vasu Sharma, Gaurav Sukhatme, Hangjie Shi, Bofei Yang, Desheng Zheng, Lucy Hu, Karthika Arumugam, Shui Hu, Matthew Wen, Dinakar Guthy, Cadence Chung, Rohan Khanna, Osman Ipek, Leslie Ball, Kate Bland, Heather Rocker, Yadunandana Rao, Michael Johnston, Reza Ghanadan, Arindam Mandal, Dilek Hakkani Tur, Prem Natarajan
We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research.
no code implementations • 12 Jan 2023 • Yuqian Jiang, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme
For service robots to become general-purpose in everyday household environments, they need not only a large library of primitive skills, but also the ability to quickly learn novel tasks specified by users.
no code implementations • 10 Dec 2022 • Yizhou Zhao, Qiaozi Gao, Liang Qiu, Govind Thattai, Gaurav S. Sukhatme
We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction.
no code implementations • 26 Aug 2022 • Vasu Sharma, Prasoon Goyal, Kaixiang Lin, Govind Thattai, Qiaozi Gao, Gaurav S. Sukhatme
We propose a multimodal (vision-and-language) benchmark for cooperative and heterogeneous multi-agent learning.
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • Findings (ACL) 2022 • Yi-Lin Tuan, Sajjad Beygi, Maryam Fazel-Zarandi, Qiaozi Gao, Alessandra Cervone, William Yang Wang
Our proposed method allows a single transformer model to directly walk on a large-scale knowledge graph to generate responses.
2 code implementations • 27 Feb 2022 • Xiaofeng Gao, Qiaozi Gao, Ran Gong, Kaixiang Lin, Govind Thattai, Gaurav S. Sukhatme
Language-guided Embodied AI benchmarks requiring an agent to navigate an environment and manipulate objects typically allow one-way communication: the human user gives a natural language command to the agent, and the agent can only follow the command passively.
1 code implementation • 24 Jan 2022 • Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme
With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23. 48% on the test unseen scenes.
no code implementations • AAAI Workshop CLeaR 2022 • Shane Storks, Qiaozi Gao, Aishwarya Reganti, Govind Thattai
Language-enabled AI systems can answer complex, multi-hop questions to high accuracy, but supporting answers with evidence is a more challenging task which is important for the transparency and trustworthiness to users.
1 code implementation • 10 Nov 2021 • Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme
However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.
1 code implementation • Findings (EMNLP) 2021 • Shane Storks, Qiaozi Gao, Yichi Zhang, Joyce Chai
However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning.
1 code implementation • 10 Aug 2021 • Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme
Language-guided robots performing home and office tasks must navigate in and interact with the world.
no code implementations • 9 Jan 2021 • Shane Storks, Qiaozi Gao, Govind Thattai, Gokhan Tur
Embodied instruction following is a challenging problem requiring an agent to infer a sequence of primitive actions to achieve a goal environment state from complex language and visual inputs.
no code implementations • 2 Dec 2020 • Qing Ping, Feiyang Niu, Govind Thattai, Joel Chengottusseriyil, Qiaozi Gao, Aishwarya Reganti, Prashanth Rajagopal, Gokhan Tur, Dilek Hakkani-Tur, Prem Nataraja
Current conversational AI systems aim to understand a set of pre-designed requests and execute related actions, which limits them to evolve naturally and adapt based on human interactions.
1 code implementation • 2 Apr 2019 • Shane Storks, Qiaozi Gao, Joyce Y. Chai
In the NLP community, recent years have seen a surge of research activities that address machines' ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge of the world.
1 code implementation • EMNLP 2018 • Shaohua Yang, Qiaozi Gao, Sari Sadiya, Joyce Chai
To enable collaboration and communication between humans and agents, this paper investigates learning to acquire commonsense evidence for action justification.
no code implementations • ACL 2018 • Qiaozi Gao, Shaohua Yang, Joyce Chai, V, Lucy erwende
Despite recent advances in knowledge representation, automated reasoning, and machine learning, artificial agents still lack the ability to understand basic action-effect relations regarding the physical world, for example, the action of cutting a cucumber most likely leads to the state where the cucumber is broken apart into smaller pieces.
no code implementations • 7 Oct 2017 • Qiaozi Gao, Lanbo She, Joyce Y. Chai
One significant simplification in most previous work on robot learning is the closed-world assumption where the robot is assumed to know ahead of time a complete set of predicates describing the state of the physical world.