Search Results for author: Yuxing Long

Found 6 papers, 3 papers with code

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

no code implementations24 Dec 2023 Xiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong

By fine-tuning the injected adapters, we preserve the inherent common sense and reasoning ability of the MLLMs while equipping them with the ability for manipulation.

Common Sense Reasoning Language Modelling +4

Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions

no code implementations20 Sep 2023 Yuxing Long, Xiaoqi Li, Wenzhe Cai, Hao Dong

The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics.

Language Modelling Large Language Model

VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

no code implementations14 Sep 2023 Yunshui Li, Binyuan Hui, Zhaochao Yin, Wanwei He, Run Luo, Yuxing Long, Min Yang, Fei Huang, Yongbin Li

Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation.

Whether you can locate or not? Interactive Referring Expression Generation

1 code implementation19 Aug 2023 Fulong Ye, Yuxing Long, Fangxiang Feng, Xiaojie Wang

Referring Expression Generation (REG) aims to generate unambiguous Referring Expressions (REs) for objects in a visual scene, with a dual task of Referring Expression Comprehension (REC) to locate the referred object.

Referring Expression Referring Expression Comprehension +1

Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark

1 code implementation26 May 2023 Yuxing Long, Binyuan Hui, Caixia Yuan1, Fei Huang, Yongbin Li, Xiaojie Wang

Existing multimodal task-oriented dialog data fails to demonstrate the diverse expressions of user subjective preferences and recommendation acts in the real-life shopping scenario.

Multimodal Recommendation

SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

1 code implementation5 Jan 2023 Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang

Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.