no code implementations • 2 Apr 2025 • Jing Liu, Wenxuan Wang, Yisi Zhang, Yepeng Tang, Xingjian He, Longteng Guo, Tongtian Yue, Xinlong Wang
Referring expression segmentation (RES) aims at segmenting the entities' masks that match the descriptive language expression.
1 code implementation • 24 Oct 2024 • Zijia Zhao, Longteng Guo, Tongtian Yue, Erdong Hu, Shuai Shao, Zehuan Yuan, Hua Huang, Jing Liu
In this paper, we investigate the task of general conversational image retrieval on open-domain images.
no code implementations • 14 Oct 2024 • Tongtian Yue, Longteng Guo, Jie Cheng, Xuange Gao, Jing Liu
In this paper, we propose a novel Ada-K routing strategy that dynamically adjusts the number of activated experts for each token, thereby improving the balance between computational efficiency and model performance.
no code implementations • 14 Oct 2024 • Tongtian Yue, Shuning Xue, Xuange Gao, Yepeng Tang, Longteng Guo, Jie Jiang, Jing Liu
First, we propose an electrode-wise modeling strategy that treats each electrode as a fundamental unit, enabling the integration of diverse EEG datasets collected from up to 138 electrodes, amassing 37. 5M pre-training samples.
no code implementations • 8 Jul 2024 • Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu
This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese image encoder with a Visual Delta Module.
1 code implementation • 13 Jun 2024 • Zijia Zhao, Haoyu Lu, Yuqi Huo, Yifan Du, Tongtian Yue, Longteng Guo, Bingning Wang, WeiPeng Chen, Jing Liu
In this paper, we propose VideoNIAH (Video Needle In A Haystack), a benchmark construction framework through synthetic video generation.
1 code implementation • CVPR 2024 • Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu
In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects and subsequently utilize these captions to accurately re-identify the objects in a closed-loop process.
1 code implementation • CVPR 2024 • Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu
To foster future research into fine-grained visual grounding our benchmark RefCOCOm the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES.
1 code implementation • 13 Dec 2023 • Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu
To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES
1 code implementation • 25 May 2023 • Zijia Zhao, Longteng Guo, Tongtian Yue, Sihan Chen, Shuai Shao, Xinxin Zhu, Zehuan Yuan, Jing Liu
We show that only language-paired two-modality data is sufficient to connect all modalities.