no code implementations • ECCV 2020 • Sinan Tan, Weilai Xiang, Huaping Liu, Di Guo, Fuchun Sun
We investigate a new AI task --- Multi-Agent Interactive Question Answering --- where several agents explore the scene jointly in interactive environments to answer a question.
2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.
1 code implementation • 24 Aug 2023 • Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.
Ranked #2 on
Visual Question Answering (VQA)
on CORE-MM
1 code implementation • 8 Dec 2022 • Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou
As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data.
1 code implementation • ICCV 2023 • Feng Wang, Sinan Tan, Xinghang Li, Zeyue Tian, Yafei Song, Huaping Liu
In this paper, we present a novel method named MixVoxels to better represent the dynamic scenes with fast training speed and competitive rendering qualities.
no code implementations • 26 Jan 2022 • Sinan Tan, Mengmeng Ge, Di Guo, Huaping Liu, Fuchun Sun
In the Vision-and-Language Navigation task, the embodied agent follows linguistic instructions and navigates to a specific goal.
no code implementations • 26 Jan 2022 • Sinan Tan, Hui Xue, Qiyu Ren, Huaping Liu, Jing Bai
Our framework is based on an innovative evolution algorithm, which is stable and suitable for multiple dataset scenario.
no code implementations • 30 Apr 2020 • Sinan Tan, Huaping Liu, Di Guo, Xin-Yu Zhang, Fuchun Sun
Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from the interaction between the agent and the environment.