1 code implementation • 31 Mar 2024 • Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji
Segment Anything Model (SAM) exhibits powerful yet versatile capabilities on (un) conditional image segmentation tasks recently.
1 code implementation • 11 Mar 2024 • Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji
Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality.
1 code implementation • 5 Mar 2024 • Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji
Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming.
Ranked #57 on Visual Question Answering on MM-Vet
1 code implementation • 1 Nov 2023 • Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun
To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e. g., referring points or grounding boxes, for efficient RES training.
1 code implementation • 31 Aug 2023 • Changli Wu, Yiwei Ma, Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji, Xiaoshuai Sun
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions.
1 code implementation • NeurIPS 2023 • Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji
To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN.
1 code implementation • CVPR 2022 • Peng Mi, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji
In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher(Source code are available at: \url{https://github. com/HunterJ-Lin/ActiveTeacher}) for semi-supervised object detection (SSOD).
1 code implementation • 22 Feb 2023 • Gen Luo, Yiyi Zhou, Lei Jin, Xiaoshuai Sun, Rongrong Ji
In addition to this challenge, we also reveal two key issues in one-stage SSOD, which are low-quality pseudo-labeling and multi-task optimization conflict, respectively.
1 code implementation • 16 Feb 2023 • Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji
Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods.
no code implementations • CVPR 2023 • Jiamu Sun, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji
In this paper, we present the first attempt of semi-supervised learning for REC and propose a strong baseline method called RefTeacher.
no code implementations • CVPR 2023 • Lei Jin, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji
Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models.
1 code implementation • 17 Apr 2022 • Gen Luo, Yiyi Zhou, Jiamu Sun, Xiaoshuai Sun, Rongrong Ji
But the most encouraging finding is that with much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models, e. g., UNITER and VILLA, portraying the special role of REC in existing V&L research.
1 code implementation • 16 Apr 2022 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yan Wang, Liujuan Cao, Yongjian Wu, Feiyue Huang, Rongrong Ji
Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost.
3 code implementations • 30 Mar 2022 • Chaoyang Zhu, Yiyi Zhou, Yunhang Shen, Gen Luo, Xingjia Pan, Mingbao Lin, Chao Chen, Liujuan Cao, Xiaoshuai Sun, Rongrong Ji
In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e. g., phrase localization, referring expression comprehension (REC) and segmentation (RES).
1 code implementation • 17 Oct 2021 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yongjian Wu, Yue Gao, Rongrong Ji
Based on the LaConv module, we further build the first fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure.
1 code implementation • 13 Dec 2020 • Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun, Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji
The latter contains a Global Adaptive Controller that can adaptively fuse the global information into the decoder to guide the caption generation.
1 code implementation • CVPR 2020 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng, Rongrong Ji
In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).
Generalized Referring Expression Comprehension Referring Expression +2
1 code implementation • 7 Dec 2019 • Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-Wen Lin, Qi Tian
Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description.