Search Results for author: Xiaoshuai Sun

Found 65 papers, 42 papers with code

Exploring Implicit Image Statistics for Visual Representativeness Modeling

no code implementations • CVPR 2013 • Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang

In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques.

Image Retrieval

Paper
Add Code

The Effectiveness of Instance Normalization: a Strong Baseline for Single Image Dehazing

no code implementations • 8 May 2018 • Zheng Xu, Xitong Yang, Xue Li, Xiaoshuai Sun

We propose a novel deep neural network architecture for the challenging problem of single image dehazing, which aims to recover the clear image from a degraded hazy image.

Image Dehazing Single Image Dehazing

Paper
Add Code

GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints

no code implementations • CVPR 2018 • Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, Jinsong Su

In offline optimization, we adopt an end-to-end formulation, which jointly trains the visual tree parser, the structured relevance and diversity constraints, as well as the LSTM based captioning model.

Image Captioning

Paper
Add Code

Deep Saliency Hashing

no code implementations • 4 Jul 2018 • Sheng Jin, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Lei Zhang, Xian-Sheng Hua

As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images.

Deep Hashing Quantization

Paper
Add Code

Semantic and Contrast-Aware Saliency

no code implementations • 9 Nov 2018 • Xiaoshuai Sun

The two pathways characterize both long-term and short-term attention cues and are integrated dynamically using maxima normalization.

Saliency Prediction

Paper
Add Code

Towards Optimal Discrete Online Hashing with Balanced Similarity

1 code implementation • 29 Jan 2019 • Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Yongjian Wu, Yunsheng Wu

In this paper, we propose a novel supervised online hashing method, termed Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above problems in a unified framework.

Retrieval

Paper
Code

Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images

5 code implementations • ICCV 2019 • Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang

Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e. g., table legs) from different coarse 3D volumes to obtain a fused 3D volume.

Ranked #4 on 3D Object Reconstruction on Data3D−R2N2

3D Object Reconstruction 3D Reconstruction +1

439

Paper
Code

Hadamard Matrix Guided Online Hashing

1 code implementation • 11 May 2019 • Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Shen Chen, Qi Tian

We then treat the learning of hash functions as a set of binary classification problems to fit the assigned target code.

Binary Classification

Paper
Code

Supervised Online Hashing via Similarity Distribution Learning

no code implementations • 31 May 2019 • Mingbao Lin, Rongrong Ji, Shen Chen, Feng Zheng, Xiaoshuai Sun, Baochang Zhang, Liujuan Cao, Guodong Guo, Feiyue Huang

In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space.

Retrieval

Paper
Add Code

Information Competing Process for Learning Diversified Representations

1 code implementation • NeurIPS 2019 • Jie Hu, Rongrong Ji, Shengchuan Zhang, Xiaoshuai Sun, Qixiang Ye, Chia-Wen Lin, Qi Tian

Learning representations with diversified information remains as an open problem.

General Classification Image Classification +2

Paper
Code

Semi-Supervised Adversarial Monocular Depth Estimation

no code implementations • 6 Aug 2019 • Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, Jiebo Luo

In this paper, we address the problem of monocular depth estimation when only a limited number of training image-depth pairs are available.

Monocular Depth Estimation

Paper
Add Code

Scene-based Factored Attention for Image Captioning

no code implementations • 7 Aug 2019 • Chen Shen, Rongrong Ji, Fuhai Chen, Xiaoshuai Sun, Xiangming Li

Specifically, the proposed module first embeds the scene concepts into factored weights explicitly and attends the visual information extracted from the input image.

Caption Generation Image Captioning +1

Paper
Add Code

Semantic-aware Image Deblurring

no code implementations • 9 Oct 2019 • Fuhai Chen, Rongrong Ji, Chengpeng Dai, Xiaoshuai Sun, Chia-Wen Lin, Jiayi Ji, Baochang Zhang, Feiyue Huang, Liujuan Cao

Specially, we propose a novel Structured-Spatial Semantic Embedding model for image deblurring (termed S3E-Deblur), which introduces a novel Structured-Spatial Semantic tree model (S3-tree) to bridge two basic tasks in computer vision: image deblurring (ImD) and image captioning (ImC).

Deblurring Image Captioning +1

Paper
Add Code

Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning

no code implementations • 14 Oct 2019 • Ying Zheng, Hongxun Yao, Xiaoshuai Sun

First, we propose a homogeneous transformation method to address the problem of domain adaptation.

Domain Adaptation Retrieval +2

Paper
Add Code

Sketch-Specific Data Augmentation for Freehand Sketch Recognition

no code implementations • 14 Oct 2019 • Ying Zheng, Hongxun Yao, Xiaoshuai Sun, Shengping Zhang, Sicheng Zhao, Fatih Porikli

Conventional methods for this task often rely on the availability of the temporal order of sketch strokes, additional cues acquired from different modalities and supervised augmentation of sketch datasets with real images, which also limit the applicability and feasibility of these methods in real scenarios.

Data Augmentation Retrieval +2

Paper
Add Code

Toward 3D Object Reconstruction from Stereo Images

1 code implementation • 18 Oct 2019 • Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, Xiaoshuai Sun, Wenxiu Sun

Inferring the 3D shape of an object from an RGB image has shown impressive results, however, existing methods rely primarily on recognizing the most similar 3D model from the training set to solve the problem.

3D Object Reconstruction Benchmarking +1

Paper
Code

Hadamard Codebook Based Deep Hashing

no code implementations • 21 Oct 2019 • Shen Chen, Liujuan Cao, Mingbao Lin, Yan Wang, Xiaoshuai Sun, Chenglin Wu, Jingfei Qiu, Rongrong Ji

Specifically, we utilize an off-the-shelf algorithm to generate a binary Hadamard codebook to satisfy the requirement of bit independence and bit balance, which subsequently serves as the desired outputs of the hash functions learning.

Deep Hashing Image Retrieval

Paper
Add Code

SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation

no code implementations • 20 Nov 2019 • Sheng Jin, Shangchen Zhou, Yao Liu, Chao Chen, Xiaoshuai Sun, Hongxun Yao, Xian-Sheng Hua

In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework.

Deep Hashing Generative Adversarial Network

Paper
Add Code

Variational Structured Semantic Inference for Diverse Image Captioning

no code implementations • NeurIPS 2019 • Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang

To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.

Image Captioning

Paper
Add Code

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

1 code implementation • 7 Dec 2019 • Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-Wen Lin, Qi Tian

Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description.

feature selection Referring Expression +1

Paper
Code

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

1 code implementation • CVPR 2020 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng, Rongrong Ji

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Ranked #4 on Generalized Referring Expression Comprehension on gRefCOCO

Generalized Referring Expression Comprehension Referring Expression +2

131

Paper
Code

Fast Class-wise Updating for Online Hashing

no code implementations • 1 Dec 2020 • Mingbao Lin, Rongrong Ji, Xiaoshuai Sun, Baochang Zhang, Feiyue Huang, Yonghong Tian, DaCheng Tao

To achieve fast online adaptivity, a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.

Paper
Add Code

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network

1 code implementation • 13 Dec 2020 • Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun, Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji

The latter contains a Global Adaptive Controller that can adaptively fuse the global information into the decoder to guide the caption generation.

Caption Generation Image Captioning

191

Paper
Code

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering

1 code implementation • ICCV 2021 • Yiyi Zhou, Tianhe Ren, Chaoyang Zhu, Xiaoshuai Sun, Jianzhuang Liu, Xinghao Ding, Mingliang Xu, Rongrong Ji

Due to the superior ability of global dependency modeling, Transformer and its variants have become the primary choice of many vision-and-language tasks.

Question Answering Referring Expression +2

Paper
Code

Dual-Level Collaborative Transformer for Image Captioning

1 code implementation • 16 Jan 2021 • Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Yongjian Wu, Feiyue Huang, Chia-Wen Lin, Rongrong Ji

Descriptive region features extracted by object detection networks have played an important role in the recent advancements of image captioning.

Descriptive Image Captioning +2

191

Paper
Code

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words

1 code implementation • CVPR 2021 • Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji

Then, we build a BERTbased language model to extract language context and propose Adaptive-Attention (AA) module on top of a transformer decoder to adaptively measure the contribution of visual and language cues before making decisions for word prediction.

Image Captioning Language Modelling +2

118

Paper
Code

Towards Language-guided Visual Recognition via Dynamic Convolutions

1 code implementation • 17 Oct 2021 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yongjian Wu, Yue Gao, Rongrong Ji

Based on the LaConv module, we further build the first fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure.

Question Answering Referring Expression +2

Paper
Code

DIFNet: Boosting Visual Information Flow for Image Captioning

no code implementations • CVPR 2022 • Mingrui Wu, Xuying Zhang, Xiaoshuai Sun, Yiyi Zhou, Chao Chen, Jiaxin Gu, Xing Sun, Rongrong Ji

Current Image captioning (IC) methods predict textual words sequentially based on the input visual information from the visual feature extractor and the partially generated sentence information.

Image Captioning Sentence

Paper
Add Code

Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

no code implementations • 12 Mar 2022 • Fuhai Chen, Xuri Ge, Xiaoshuai Sun, Yue Gao, Jianzhuang Liu, Fufeng Chen, Wenjie Li

The key of referring expression comprehension lies in capturing the cross-modal visual-linguistic relevance.

Attribute Object +2

Paper
Add Code

Global2Local: A Joint-Hierarchical Attention for Video Captioning

no code implementations • 13 Mar 2022 • Chengpeng Dai, Fuhai Chen, Xiaoshuai Sun, Rongrong Ji, Qixiang Ye, Yongjian Wu

Recently, automatic video captioning has attracted increasing attention, where the core challenge lies in capturing the key semantic items, like objects and actions as well as their spatial-temporal correlations from the redundant frames and semantic content.

Video Captioning

Paper
Add Code

SeqTR: A Simple yet Universal Network for Visual Grounding

3 code implementations • 30 Mar 2022 • Chaoyang Zhu, Yiyi Zhou, Yunhang Shen, Gen Luo, Xingjia Pan, Mingbao Lin, Chao Chen, Liujuan Cao, Xiaoshuai Sun, Rongrong Ji

In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e. g., phrase localization, referring expression comprehension (REC) and segmentation (RES).

Referring Expression Referring Expression Comprehension +1

121

Paper
Code

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

1 code implementation • 1 Apr 2022 • Mingrui Wu, Jiaxin Gu, Yunhang Shen, Mingbao Lin, Chao Chen, Xiaoshuai Sun

Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs.

Human-Object Interaction Detection Knowledge Distillation +4

Paper
Code

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

1 code implementation • 2 Apr 2022 • Jing He, Yiyi Zhou, Qi Zhang, Jun Peng, Yunhang Shen, Xiaoshuai Sun, Chao Chen, Rongrong Ji

Pixel synthesis is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation.

Image Generation regression

Paper
Code

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

1 code implementation • 16 Apr 2022 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yan Wang, Liujuan Cao, Yongjian Wu, Feiyue Huang, Rongrong Ji

Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost.

Image Classification

Paper
Code

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension

1 code implementation • 17 Apr 2022 • Gen Luo, Yiyi Zhou, Jiamu Sun, Xiaoshuai Sun, Rongrong Ji

But the most encouraging finding is that with much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models, e. g., UNITER and VILLA, portraying the special role of REC in existing V&L research.

Data Augmentation Referring Expression +1

Paper
Code

X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval

1 code implementation • 15 Jul 2022 • Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, Rongrong Ji

However, cross-grained contrast, which is the contrast between coarse-grained representations and fine-grained representations, has rarely been explored in prior research.

Ranked #12 on Video Retrieval on MSVD

Contrastive Learning Retrieval +2

110

Paper
Code

Clover: Towards A Unified Video-Language Alignment and Fusion Model

1 code implementation • CVPR 2023 • Jingjia Huang, Yinan Li, Jiashi Feng, Xinglong Wu, Xiaoshuai Sun, Rongrong Ji

We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.

Ranked #1 on Video Question Answering on LSMDC-FiB

Language Modelling Question Answering +10

Paper
Code

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

1 code implementation • 11 Oct 2022 • Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, DaCheng Tao

One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight.

Paper
Code

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension

no code implementations • CVPR 2023 • Jiamu Sun, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji

In this paper, we present the first attempt of semi-supervised learning for REC and propose a strong baseline method called RefTeacher.

Imitation Learning Pseudo Label +2

Paper
Add Code

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension

no code implementations • CVPR 2023 • Lei Jin, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji

Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models.

Referring Expression Referring Expression Comprehension +2

Paper
Add Code

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network

1 code implementation • 9 Jan 2023 • Haowei Wang, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Xiaoshuai Sun

Extensive experiments on the PNG benchmark dataset demonstrate the effectiveness and efficiency of our method.

Paper
Code

Towards Local Visual Modeling for Image Captioning

1 code implementation • 13 Feb 2023 • Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Rongrong Ji

In this paper, we study the local visual modeling with grid features for image captioning, which is critical for generating accurate and detailed captions.

Image Captioning Object Recognition

Paper
Code

Towards Efficient Visual Adaption via Structural Re-parameterization

1 code implementation • 16 Feb 2023 • Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji

Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods.

Semantic Segmentation Transfer Learning

175

Paper
Code

Towards End-to-end Semi-supervised Learning for One-stage Object Detection

1 code implementation • 22 Feb 2023 • Gen Luo, Yiyi Zhou, Lei Jin, Xiaoshuai Sun, Rongrong Ji

In addition to this challenge, we also reveal two key issues in one-stage SSOD, which are low-quality pseudo-labeling and multi-task optimization conflict, respectively.

object-detection Object Detection +2

Paper
Code

Active Teacher for Semi-Supervised Object Detection

1 code implementation • CVPR 2022 • Peng Mi, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji

In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher(Source code are available at: \url{https://github. com/HunterJ-Lin/ActiveTeacher}) for semi-supervised object detection (SSOD).

Object object-detection +2

Paper
Code

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

1 code implementation • ICCV 2023 • Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji

Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text.

Attribute

Paper
Code

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

1 code implementation • NeurIPS 2023 • Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji

To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN.

Chatbot Natural Language Understanding +1

472

Paper
Code

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting

1 code implementation • 1 Jun 2023 • Shubin Huang, Qiong Wu, Yiyi Zhou, WeiJie Chen, Rongsheng Zhang, Xiaoshuai Sun, Rongrong Ji

In addition, we also experiment DVP with the recently popular adapter approach to keep the most parameters of PLMs intact when adapting to VL tasks, helping PLMs achieve a quick shift between single- and multi-modal tasks.

Transfer Learning Visual Prompting

Paper
Code

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

1 code implementation • 30 Jun 2023 • Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, DaCheng Tao

Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight.

Paper
Code

Towards General Visual-Linguistic Face Forgery Detection

no code implementations • 31 Jul 2023 • Ke Sun, Shen Chen, Taiping Yao, Haozhe Yang, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

To address this issues, in this paper, we propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.

Binary Classification DeepFake Detection +2

Paper
Add Code

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

1 code implementation • 6 Aug 2023 • Haowei Wang, Jiji Tang, Jiayi Ji, Xiaoshuai Sun, Rongsheng Zhang, Yiwei Ma, Minda Zhao, Lincheng Li, Zeng Zhao, Tangjie Lv, Rongrong Ji

Insufficient synergy neglects the idea that a robust 3D representation should align with the joint vision-language space, rather than independently aligning with each modality.

Ranked #1 on Zero-shot 3D Point Cloud Classification on ModelNet40

3D Classification 3D Part Segmentation +5

Paper
Code

Continual Face Forgery Detection via Historical Distribution Preserving

no code implementations • 11 Aug 2023 • Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

In this paper, we focus on a novel and challenging problem: Continual Face Forgery Detection (CFFD), which aims to efficiently learn from new forgery attacks without forgetting previous ones.

Knowledge Distillation

Paper
Add Code

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation

1 code implementation • 31 Aug 2023 • Changli Wu, Yiwei Ma, Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji, Xiaoshuai Sun

In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions.

Navigate Referring Expression +3

Paper
Code

JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues

1 code implementation • 14 Oct 2023 • Jiayi Ji, Haowei Wang, Changli Wu, Yiwei Ma, Xiaoshuai Sun, Rongrong Ji

The rising importance of 3D understanding, pivotal in computer vision, autonomous driving, and robotics, is evident.

Autonomous Driving Representation Learning

Paper
Code

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning

1 code implementation • 17 Oct 2023 • Haowei Wang, Jiayi Ji, Tianyu Guo, Yilong Yang, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

To address this, we introduce two cascading modules based on the barycenter of the mask, which are Coordinate Guided Aggregation (CGA) and Barycenter Driven Localization (BDL), responsible for segmentation and detection, respectively.

Segmentation Visual Grounding

Paper
Code

Semi-Supervised Panoptic Narrative Grounding

1 code implementation • 27 Oct 2023 • Danni Yang, Jiayi Ji, Xiaoshuai Sun, Haowei Wang, Yinan Li, Yiwei Ma, Rongrong Ji

Remarkably, our SS-PNG-NW+ outperforms fully-supervised models with only 30% and 50% supervision data, exceeding their performance by 0. 8% and 1. 1% respectively.

Data Augmentation Pseudo Label

Paper
Code

Towards Omni-supervised Referring Expression Segmentation

1 code implementation • 1 Nov 2023 • Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun

To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e. g., referring points or grounding boxes, for efficient RES training.

Referring Expression Referring Expression Segmentation +1

Paper
Code

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

1 code implementation • 30 Nov 2023 • Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji

Nevertheless, a substantial domain gap exists between 2D images and 3D assets, primarily attributed to variations in camera-related attributes and the exclusive presence of foreground objects.

3D Generation Text to 3D

Paper
Code

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

1 code implementation • 19 Dec 2023 • Sihan Liu, Yiwei Ma, Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing, delineating specific regions in aerial images as described by textual queries.

Image Segmentation Segmentation +1

Paper
Code

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks

1 code implementation • 15 Jan 2024 • Siyu Zou, Jiji Tang, Yiyi Zhou, Jing He, Chaoyi Zhao, Rongsheng Zhang, Zhipeng Hu, Xiaoshuai Sun

In particular, InstDiffEdit aims to employ the cross-modal attention ability of existing diffusion models to achieve instant mask guidance during the diffusion steps.

Paper
Code

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

1 code implementation • 5 Mar 2024 • Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming.

Ranked #55 on Visual Question Answering on MM-Vet

Visual Question Answering

156

Paper
Code

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

1 code implementation • 11 Mar 2024 • Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji

Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality.

Face Generation Text to 3D

Paper
Code

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

no code implementations • 22 Mar 2024 • Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS).

Transfer Learning

Paper
Add Code

DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis

1 code implementation • 27 Mar 2024 • Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji

The rapid progress in deep learning has given rise to hyper-realistic facial forgery methods, leading to concerns related to misinformation and security risks.

Image Generation Misinformation

Paper
Code

Deep Instruction Tuning for Segment Anything Model

1 code implementation • 31 Mar 2024 • Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

Segment Anything Model (SAM) exhibits powerful yet versatile capabilities on (un) conditional image segmentation tasks recently.

Image Segmentation Segmentation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.