Search Results for author: Yiyi Zhou

Found 31 papers, 25 papers with code

Deep Instruction Tuning for Segment Anything Model

1 code implementation • 31 Mar 2024 • Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

Recently, Segment Anything Model (SAM) has become a research hotspot in the fields of multimedia and computer vision, which exhibits powerful yet versatile capabilities on various (un) conditional image segmentation tasks.

Decoder Image Segmentation +2

Paper
Code

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

no code implementations • 22 Mar 2024 • Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS).

Transfer Learning

Paper
Add Code

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

1 code implementation • 11 Mar 2024 • Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji

Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality.

Face Generation Text to 3D

Paper
Code

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

1 code implementation • 5 Mar 2024 • Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming.

Ranked #62 on Visual Question Answering on MM-Vet

Visual Question Answering

170

Paper
Code

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks

1 code implementation • 15 Jan 2024 • Siyu Zou, Jiji Tang, Yiyi Zhou, Jing He, Chaoyi Zhao, Rongsheng Zhang, Zhipeng Hu, Xiaoshuai Sun

In particular, InstDiffEdit aims to employ the cross-modal attention ability of existing diffusion models to achieve instant mask guidance during the diffusion steps.

Paper
Code

Towards Omni-supervised Referring Expression Segmentation

1 code implementation • 1 Nov 2023 • Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun

To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e. g., referring points or grounding boxes, for efficient RES training.

Referring Expression Referring Expression Segmentation +1

Paper
Code

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning

1 code implementation • 17 Oct 2023 • Haowei Wang, Jiayi Ji, Tianyu Guo, Yilong Yang, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

To address this, we introduce two cascading modules based on the barycenter of the mask, which are Coordinate Guided Aggregation (CGA) and Barycenter Driven Localization (BDL), responsible for segmentation and detection, respectively.

Segmentation Visual Grounding

Paper
Code

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

1 code implementation • 22 Aug 2023 • Tao Chen, Ze Lin, Hui Li, Jiayi Ji, Yiyi Zhou, Guanbin Li, Rongrong Ji

Furthermore, we model product attributes based on both text and image modalities so that multi-modal product characteristics can be manifested in the generated summaries.

Attribute

Paper
Code

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

1 code implementation • 30 Jun 2023 • Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, DaCheng Tao

Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight.

Paper
Code

Approximated Prompt Tuning for Vision-Language Pre-trained Models

no code implementations • 27 Jun 2023 • Qiong Wu, Shubin Huang, Yiyi Zhou, Pingyang Dai, Annan Shu, Guannan Jiang, Rongrong Ji

Prompt tuning is a parameter-efficient way to deploy large-scale pre-trained models to downstream tasks by adding task-specific tokens.

Image Classification Text-to-Image Generation +1

Paper
Add Code

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting

1 code implementation • 1 Jun 2023 • Shubin Huang, Qiong Wu, Yiyi Zhou, WeiJie Chen, Rongsheng Zhang, Xiaoshuai Sun, Rongrong Ji

In addition, we also experiment DVP with the recently popular adapter approach to keep the most parameters of PLMs intact when adapting to VL tasks, helping PLMs achieve a quick shift between single- and multi-modal tasks.

Transfer Learning Visual Prompting

Paper
Code

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

1 code implementation • NeurIPS 2023 • Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji

To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN.

Chatbot Natural Language Understanding +1

488

Paper
Code

Active Teacher for Semi-Supervised Object Detection

1 code implementation • CVPR 2022 • Peng Mi, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji

In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher(Source code are available at: \url{https://github. com/HunterJ-Lin/ActiveTeacher}) for semi-supervised object detection (SSOD).

Object object-detection +2

Paper
Code

Towards End-to-end Semi-supervised Learning for One-stage Object Detection

1 code implementation • 22 Feb 2023 • Gen Luo, Yiyi Zhou, Lei Jin, Xiaoshuai Sun, Rongrong Ji

In addition to this challenge, we also reveal two key issues in one-stage SSOD, which are low-quality pseudo-labeling and multi-task optimization conflict, respectively.

object-detection Object Detection +2

Paper
Code

Towards Efficient Visual Adaption via Structural Re-parameterization

1 code implementation • 16 Feb 2023 • Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji

Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods.

Semantic Segmentation Transfer Learning

183

Paper
Code

Towards Local Visual Modeling for Image Captioning

1 code implementation • 13 Feb 2023 • Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Rongrong Ji

In this paper, we study the local visual modeling with grid features for image captioning, which is critical for generating accurate and detailed captions.

Image Captioning Object Recognition

Paper
Code

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network

1 code implementation • 9 Jan 2023 • Haowei Wang, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Xiaoshuai Sun

Extensive experiments on the PNG benchmark dataset demonstrate the effectiveness and efficiency of our method.

Paper
Code

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension

no code implementations • CVPR 2023 • Jiamu Sun, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji

In this paper, we present the first attempt of semi-supervised learning for REC and propose a strong baseline method called RefTeacher.

Imitation Learning Pseudo Label +2

Paper
Add Code

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension

no code implementations • CVPR 2023 • Lei Jin, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji

Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models.

Referring Expression Referring Expression Comprehension +2

Paper
Add Code

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

1 code implementation • 11 Oct 2022 • Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, DaCheng Tao

One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight.

Paper
Code

CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification

no code implementations • 21 Aug 2022 • Qiong Wu, Jiaer Xia, Pingyang Dai, Yiyi Zhou, Yongjian Wu, Rongrong Ji

Visible-infrared person re-identification (VI-ReID) is a task of matching the same individuals across the visible and infrared modalities.

Person Re-Identification

Paper
Add Code

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension

1 code implementation • 17 Apr 2022 • Gen Luo, Yiyi Zhou, Jiamu Sun, Xiaoshuai Sun, Rongrong Ji

But the most encouraging finding is that with much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models, e. g., UNITER and VILLA, portraying the special role of REC in existing V&L research.

Data Augmentation Referring Expression +1

Paper
Code

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

1 code implementation • 16 Apr 2022 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yan Wang, Liujuan Cao, Yongjian Wu, Feiyue Huang, Rongrong Ji

Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost.

Image Classification

Paper
Code

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

1 code implementation • 2 Apr 2022 • Jing He, Yiyi Zhou, Qi Zhang, Jun Peng, Yunhang Shen, Xiaoshuai Sun, Chao Chen, Rongrong Ji

Pixel synthesis is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation.

Image Generation regression

Paper
Code

SeqTR: A Simple yet Universal Network for Visual Grounding

3 code implementations • 30 Mar 2022 • Chaoyang Zhu, Yiyi Zhou, Yunhang Shen, Gen Luo, Xingjia Pan, Mingbao Lin, Chao Chen, Liujuan Cao, Xiaoshuai Sun, Rongrong Ji

In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e. g., phrase localization, referring expression comprehension (REC) and segmentation (RES).

Decoder Referring Expression +2

123

Paper
Code

DIFNet: Boosting Visual Information Flow for Image Captioning

no code implementations • CVPR 2022 • Mingrui Wu, Xuying Zhang, Xiaoshuai Sun, Yiyi Zhou, Chao Chen, Jiaxin Gu, Xing Sun, Rongrong Ji

Current Image captioning (IC) methods predict textual words sequentially based on the input visual information from the visual feature extractor and the partially generated sentence information.

Image Captioning Sentence

Paper
Add Code

Towards Language-guided Visual Recognition via Dynamic Convolutions

1 code implementation • 17 Oct 2021 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yongjian Wu, Yue Gao, Rongrong Ji

Based on the LaConv module, we further build the first fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure.

Question Answering Referring Expression +2

Paper
Code

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words

1 code implementation • CVPR 2021 • Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji

Then, we build a BERTbased language model to extract language context and propose Adaptive-Attention (AA) module on top of a transformer decoder to adaptively measure the contribution of visual and language cues before making decisions for word prediction.

Decoder Image Captioning +3

118

Paper
Code

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering

1 code implementation • ICCV 2021 • Yiyi Zhou, Tianhe Ren, Chaoyang Zhu, Xiaoshuai Sun, Jianzhuang Liu, Xinghao Ding, Mingliang Xu, Rongrong Ji

Due to the superior ability of global dependency modeling, Transformer and its variants have become the primary choice of many vision-and-language tasks.

Question Answering Referring Expression +2

Paper
Code

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

2 code implementations • CVPR 2020 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng, Rongrong Ji

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Ranked #4 on Generalized Referring Expression Comprehension on gRefCOCO

Generalized Referring Expression Comprehension Referring Expression +2

132

Paper
Code

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

1 code implementation • 7 Dec 2019 • Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-Wen Lin, Qi Tian

Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description.

feature selection Referring Expression +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.