Search Results for author: Kaizhi Zheng

Found 6 papers, 2 papers with code

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

no code implementations • 17 Jun 2022 • Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, Xin Eric Wang

We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.

Paper
Add Code

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

no code implementations • 28 Aug 2022 • Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang

Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.

Action Generation Common Sense Reasoning +1

Paper
Add Code

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

no code implementations • 30 Jan 2023 • Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric Wang

Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments.

Efficient Exploration Language Modelling +2

Paper
Add Code

R2H: Building Multimodal Navigation Helpers that Respond to Help Requests

no code implementations • 23 May 2023 • Yue Fan, Jing Gu, Kaizhi Zheng, Xin Eric Wang

Intelligent navigation-helper agents are critical as they can navigate users in unknown areas through environmental awareness and conversational ability, serving as potential accessibility tools for individuals with disabilities.

Benchmarking Language Modelling +3

Paper
Add Code

Manipulation-Oriented Object Perception in Clutter through Affordance Coordinate Frames

1 code implementation • 16 Oct 2020 • Xiaotong Chen, Kaizhi Zheng, Zhen Zeng, Cameron Kisailus, Shreshtha Basu, James Cooney, Jana Pavlasek, Odest Chadwicke Jenkins

In this work, we combine the notions of affordance and category-level pose, and introduce the Affordance Coordinate Frame (ACF).

Object object-detection +3

Paper
Code

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

1 code implementation • 3 Oct 2023 • Kaizhi Zheng, Xuehai He, Xin Eric Wang

The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding.

Image Generation multimodal generation +2

816

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.