Search Results for author: Gen Luo

Found 18 papers, 16 papers with code

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network

1 code implementation13 Dec 2020 Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun, Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji

The latter contains a Global Adaptive Controller that can adaptively fuse the global information into the decoder to guide the caption generation.

Caption Generation Image Captioning

Towards Efficient Visual Adaption via Structural Re-parameterization

1 code implementation16 Feb 2023 Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji

Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods.

Semantic Segmentation Transfer Learning

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

1 code implementation5 Mar 2024 Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming.

Visual Question Answering

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

1 code implementation CVPR 2020 Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng, Rongrong Ji

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Generalized Referring Expression Comprehension Referring Expression +2

SeqTR: A Simple yet Universal Network for Visual Grounding

3 code implementations30 Mar 2022 Chaoyang Zhu, Yiyi Zhou, Yunhang Shen, Gen Luo, Xingjia Pan, Mingbao Lin, Chao Chen, Liujuan Cao, Xiaoshuai Sun, Rongrong Ji

In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e. g., phrase localization, referring expression comprehension (REC) and segmentation (RES).

Referring Expression Referring Expression Comprehension +1

Towards End-to-end Semi-supervised Learning for One-stage Object Detection

1 code implementation22 Feb 2023 Gen Luo, Yiyi Zhou, Lei Jin, Xiaoshuai Sun, Rongrong Ji

In addition to this challenge, we also reveal two key issues in one-stage SSOD, which are low-quality pseudo-labeling and multi-task optimization conflict, respectively.

object-detection Object Detection +2

Active Teacher for Semi-Supervised Object Detection

1 code implementation CVPR 2022 Peng Mi, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji

In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher(Source code are available at: \url{https://github. com/HunterJ-Lin/ActiveTeacher}) for semi-supervised object detection (SSOD).

Object object-detection +2

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension

1 code implementation17 Apr 2022 Gen Luo, Yiyi Zhou, Jiamu Sun, Xiaoshuai Sun, Rongrong Ji

But the most encouraging finding is that with much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models, e. g., UNITER and VILLA, portraying the special role of REC in existing V&L research.

Data Augmentation Referring Expression +1

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation

1 code implementation31 Aug 2023 Changli Wu, Yiwei Ma, Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji, Xiaoshuai Sun

In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions.

Navigate Referring Expression +3

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

1 code implementation7 Dec 2019 Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-Wen Lin, Qi Tian

Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description.

feature selection Referring Expression +1

Deep Instruction Tuning for Segment Anything Model

1 code implementation31 Mar 2024 Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

Segment Anything Model (SAM) exhibits powerful yet versatile capabilities on (un) conditional image segmentation tasks recently.

Image Segmentation Segmentation +1

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

1 code implementation11 Mar 2024 Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji

Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality.

Face Generation Text to 3D

Towards Omni-supervised Referring Expression Segmentation

1 code implementation1 Nov 2023 Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun

To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e. g., referring points or grounding boxes, for efficient RES training.

Referring Expression Referring Expression Segmentation +1

Towards Language-guided Visual Recognition via Dynamic Convolutions

1 code implementation17 Oct 2021 Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yongjian Wu, Yue Gao, Rongrong Ji

Based on the LaConv module, we further build the first fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure.

Question Answering Referring Expression +2

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension

no code implementations CVPR 2023 Lei Jin, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji

Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models.

Referring Expression Referring Expression Comprehension +2

Cannot find the paper you are looking for? You can Submit a new open access paper.