Pandora: Towards General World Model with Natural Language Actions and Video States

12 Jun 2024 Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions.

MMToM-QA: Multimodal Theory of Mind Question Answering

16 Jan 2024 Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models).

Language Models Meet World Models: Embodied Experiences Enhance Language Models

NeurIPS 2023 Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, ZiRui Wang, Zichao Yang, Zhiting Hu

While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities.

ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models

9 Oct 2022 Jiannan Xiang, Zhengzhong Liu, Yucheng Zhou, Eric P. Xing, Zhiting Hu

In the data disambiguation stage, we employ the prompted GPT-3 model to understand possibly ambiguous triples from the input data and convert each into a short sentence with reduced ambiguity.

Assessing Dialogue Systems with Distribution Distances

Findings (ACL) 2021 Jiannan Xiang, Yahui Liu, Deng Cai, Huayang Li, Defu Lian, Lemao Liu

An important aspect of developing dialogue systems is how to evaluate and compare the performance of different systems.

