Search Results for author: Rongrong Ji

Found 308 papers, 201 papers with code

Enabling Deep Residual Networks for Weakly Supervised Object Detection

no code implementations ECCV 2020 Yunhang Shen, Rongrong Ji, Yan Wang, Zhiwei Chen, Feng Zheng, Feiyue Huang, Yunsheng Wu

Weakly supervised object detection (WSOD) has attracted extensive research attention due to its great flexibility of exploiting large-scale image-level annotation for detector training.

Object object-detection +1

SSCGAN: Facial Attribute Editing via Style Skip Connections

no code implementations ECCV 2020 Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Rongrong Ji

Each connection extracts the style feature of the latent feature maps in the encoder and then performs a residual learning based mapping function in the global information space guided by the target attributes.

Attribute Decoder +1

API-Net: Robust Generative Classifier via a Single Discriminator

1 code implementation ECCV 2020 Xinshuai Dong, Hong Liu, Rongrong Ji, Liujuan Cao, Qixiang Ye, Jianzhuang Liu, Qi Tian

On the contrary, a discriminative classifier only models the conditional distribution of labels given inputs, but benefits from effective optimization owing to its succinct structure.

Robust classification

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

1 code implementation7 Feb 2025 Yunhang Shen, Chaoyou Fu, Shaoqi Dong, Xiong Wang, Peixian Chen, Mengdan Zhang, Haoyu Cao, Ke Li, Xiawu Zheng, Yan Zhang, Yiyi Zhou, Rongrong Ji, Xing Sun

Establishing the long-context capability of large vision-language models is crucial for video understanding, high-resolution image understanding, multi-modal agents and reasoning.

4k General Knowledge +4

Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting

no code implementations30 Jan 2025 Yansong Qu, Dian Chen, Xinyang Li, Xiaofan Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

It enables users to conveniently specify the desired editing region and the desired dragging direction through the input of 3D masks and pairs of control points, thereby enabling precise control over the extent of editing.

3DGS 3D scene Editing

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

1 code implementation3 Jan 2025 Chaoyou Fu, Haojia Lin, Xiong Wang, Yi-Fan Zhang, Yunhang Shen, Xiaoyu Liu, Haoyu Cao, Zuwei Long, Heting Gao, Ke Li, Long Ma, Xiawu Zheng, Rongrong Ji, Xing Sun, Caifeng Shan, Ran He

Recent Multimodal Large Language Models (MLLMs) have typically focused on integrating visual and textual modalities, with less emphasis placed on the role of speech in enhancing interaction.

SVFR: A Unified Framework for Generalized Video Face Restoration

1 code implementation2 Jan 2025 Zhiyao Wang, Xu Chen, Chengming Xu, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Chengjie Wang, Yuqi Liu, Yiyi Zhou, Rongrong Ji

In this paper, we propose a novel approach for the Generalized Video Face Restoration (GVFR) task, which integrates video BFR, inpainting, and colorization tasks that we empirically show to benefit each other.

Colorization Representation Learning

Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers

no code implementations21 Dec 2024 Yunshan Zhong, Yuyao Zhou, Yuxin Zhang, Shen Li, Yong Li, Fei Chao, Zhanpeng Zeng, Rongrong Ji

Data-free quantization (DFQ), which facilitates model quantization without real data to address increasing concerns about data security, has garnered significant attention within the model compression community.

Data Free Quantization Model Compression

DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On

no code implementations19 Dec 2024 Wengyi Zhan, Mingbao Lin, Shuicheng Yan, Rongrong Ji

We introduce DiffusionTrend for virtual fashion try-on, which forgoes the need for retraining diffusion models.

Denoising Image Generation +1

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

1 code implementation5 Dec 2024 Bo Tong, Bokai Lai, Yiyi Zhou, Gen Luo, Yunhang Shen, Ke Li, Xiaoshuai Sun, Rongrong Ji

Despite a big leap forward in capability, multimodal large language models (MLLMs) tend to behave like a sloth in practical use, i. e., slow response and large latency.

Descriptive Visual Question Answering

AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation

1 code implementation3 Dec 2024 Zhihang Lin, Mingbao Lin, Wengyi Zhan, Rongrong Ji

Finally, our analysis indicates that global semantic information is conducive to suppressing both repetitive generation and local distortion.

Image Generation Local Distortion

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

1 code implementation29 Nov 2024 Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

In particular, we reveal that visual tokens will stop contributing to reasoning when the text tokens receive enough image information, yielding obvious visual redundancy.

Multimodal Reasoning

Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding

no code implementations25 Nov 2024 Yubin Gu, Yuan Meng, Xiaoshuai Sun, Jiayi Ji, Weijian Ruan, Rongrong Ji

In this paper, we propose a novel multiple-in-one IR model that can effectively restore images with both single and mixed degradations.

Decoder Diversity +1

Any-to-3D Generation via Hybrid Diffusion Supervision

no code implementations22 Nov 2024 Yijun Fan, Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

To our knowledge, this is the first method to generate 3D objects from any modality prompts.

3D Generation Image to 3D

RLE: A Unified Perspective of Data Augmentation for Cross-Spectral Re-identification

1 code implementation2 Nov 2024 Lei Tan, Yukang Zhang, Keke Han, Pingyang Dai, Yan Zhang, Yongjian Wu, Rongrong Ji

From this view, we unify all data augmentation strategies for cross-spectral re-identification by mimicking such local linear transformations and categorizing them into moderate transformation and radical transformation.

Data Augmentation

TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images

no code implementations1 Nov 2024 Mengcheng Li, Mingbao Lin, Fei Chao, Chia-Wen Lin, Rongrong Ji

In this paper, we propose TextDestroyer, the first training- and annotation-free method for scene text destruction using a pre-trained diffusion model.

Denoising

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

no code implementations17 Oct 2024 Yaxin Luo, Gen Luo, Jiayi Ji, Yiyi Zhou, Xiaoshuai Sun, Zhiqiang Shen, Rongrong Ji

In $\gamma$-MoD, a novel metric is proposed to guide the deployment of MoDs in the MLLM, namely rank of attention maps (ARank).

Visual Question Answering

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

1 code implementation6 Oct 2024 Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content.

DeepFake Detection Domain Generalization +1

Boosting CLIP Adaptation for Image Quality Assessment via Meta-Prompt Learning and Gradient Regularization

no code implementations9 Sep 2024 Xudong Li, Zihao Huang, Runze Hu, Yan Zhang, Liujuan Cao, Rongrong Ji

Image Quality Assessment (IQA) remains an unresolved challenge in the field of computer vision, due to complex distortion conditions, diverse image content, and limited data availability.

Image Quality Assessment Meta-Learning

PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification

no code implementations29 Aug 2024 Lei Tan, Pingyang Dai, Jie Chen, Liujuan Cao, Yongjian Wu, Rongrong Ji

Extracting robust feature representation is critical for object re-identification to accurately identify objects across non-overlapping cameras.

Diversity Object

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

1 code implementation26 Aug 2024 Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji

We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models.

TraDiffusion: Trajectory-Based Training-Free Image Generation

1 code implementation19 Aug 2024 Mingrui Wu, Oucheng Huang, Jiayi Ji, Jiale Li, Xinyue Cai, Huafeng Kuang, Jianzhuang Liu, Xiaoshuai Sun, Rongrong Ji

In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion.

Image Generation

CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection

no code implementations15 Aug 2024 Xunfa Lai, Zhiyu Yang, Jie Hu, Shengchuan Zhang, Liujuan Cao, Guannan Jiang, Zhiyu Wang, Songan Zhang, Rongrong Ji

Existing camouflaged object detection~(COD) methods depend heavily on large-scale pixel-level annotations. However, acquiring such annotations is laborious due to the inherent camouflage characteristics of the objects. Semi-supervised learning offers a promising solution to this challenge. Yet, its application in COD is hindered by significant pseudo-label noise, both pixel-level and instance-level. We introduce CamoTeacher, a novel semi-supervised COD framework, utilizing Dual-Rotation Consistency Learning~(DRCL) to effectively address these noise issues. Specifically, DRCL minimizes pseudo-label noise by leveraging rotation views' consistency in pixel-level and instance-level. First, it employs Pixel-wise Consistency Learning~(PCL) to deal with pixel-level noise by reweighting the different parts within the pseudo-label. Second, Instance-wise Consistency Learning~(ICL) is used to adjust weights for pseudo-labels, which handles instance-level noise. Extensive experiments on four COD benchmark datasets demonstrate that the proposed CamoTeacher not only achieves state-of-the-art compared with semi-supervised learning methods, but also rivals established fully-supervised learning methods. Our code will be available soon.

object-detection Object Detection +1

Beyond Inter-Item Relations: Dynamic Adaption for Enhancing LLM-Based Sequential Recommendation

no code implementations14 Aug 2024 CanYi Liu, Wei Li, Youchen, Zhang, Hui Li, Rongrong Ji

Built on top of coarse-grained adaption for capturing inter-item relations, DARec is further enhanced with (1) context masking that models intra-item relations to help LLM better understand token and item semantics in the context of SRS, (2) collaborative knowledge injection that helps LLM incorporate long-term collaborative knowledge, and (3) a dynamic adaption mechanism that uses Bayesian optimization to flexibly choose layer-wise adapter architectures in order to better incorporate different sequential information.

Bayesian Optimization Sequential Recommendation

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model

1 code implementation11 Aug 2024 Ziyin Zhou, Ke Sun, Zhongxi Chen, Huafeng Kuang, Xiaoshuai Sun, Rongrong Ji

The rapid progress in generative models has given rise to the critical task of AI-Generated Content Stealth (AIGC-S), which aims to create AI-generated images that can evade both forensic detectors and human inspection.

VITA: Towards Open-Source Interactive Omni Multimodal LLM

1 code implementation9 Aug 2024 Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Shaoqi Dong, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun

The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas.

Language Modeling Language Modelling +3

EasyInv: Toward Fast and Better DDIM Inversion

1 code implementation9 Aug 2024 Ziyue Zhang, Mingbao Lin, Shuicheng Yan, Rongrong Ji

This paper introduces EasyInv, an easy yet novel approach that significantly advances the field of DDIM Inversion by addressing the inherent inefficiencies and performance limitations of traditional iterative optimization methods.

Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

1 code implementation9 Aug 2024 Yifan Feng, Jiangang Huang, Shaoyi Du, Shihui Ying, Jun-Hai Yong, Yipeng Li, Guiguang Ding, Rongrong Ji, Yue Gao

We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features.

object-detection Object Detection

Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation

1 code implementation7 Aug 2024 Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji

This method is grounded in two key innovations: (1) The learning of group-wise scale factors for quantized LLM weights to mitigate the quantization error arising from activation outliers and achieve more effective vision-language instruction tuning; (2) The implementation of a multimodal warmup that progressively integrates linguistic and multimodal training samples, thereby preventing overfitting of the quantized model to multimodal data while ensuring stable adaptation of multimodal large language models to downstream vision-language tasks.

Quantization

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

1 code implementation31 Jul 2024 Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji

We observe that attention, as the core module of MLLMs, connects text prompt tokens and visual tokens, ultimately determining the final results.

Domain Generalization

3D-GRES: Generalized 3D Referring Expression Segmentation

2 code implementations30 Jul 2024 Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji

3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.

Object Referring Expression +3

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

1 code implementation24 Jul 2024 Lirui Zhao, Tianshuo Yang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang, Rongrong Ji

To tackle this challenge, we introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control.

Image Inpainting Object

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

1 code implementation15 Jul 2024 Zhihang Lin, Mingbao Lin, Meng Zhao, Rongrong Ji

This paper attempts to address the object repetition issue in patch-wise higher-resolution image generation.

Image Generation Object

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

no code implementations9 Jul 2024 Yunshan Zhong, Jiawei Hu, You Huang, Yuxin Zhang, Rongrong Ji

ERQ first introduces Activation quantization error reduction (Aqer) that strategically formulates the minimization of activation quantization error as a Ridge Regression problem, tackling it by updating weights with full-precision.

Quantization regression

Multi-branch Collaborative Learning Network for 3D Visual Grounding

1 code implementation7 Jul 2024 Zhipeng Qian, Yiwei Ma, Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration.

3D visual grounding Referring Expression +1

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

1 code implementation7 Jul 2024 Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

Specifically, we introduce the DiffPNG framework, a straightforward yet effective approach that fully capitalizes on the diffusion's architecture for segmentation by decomposing the process into a sequence of localization, segmentation, and refinement steps.

Segmentation Sentence +1

AnySR: Realizing Image Super-Resolution as Any-Scale, Any-Resource

1 code implementation5 Jul 2024 Wengyi Zhan, Mingbao Lin, Chia-Wen Lin, Rongrong Ji

As a contrast to off-the-shelf methods that solve SR tasks across various scales with the same computing costs, our AnySR innovates in: 1) building arbitrary-scale tasks as any-resource implementation, reducing resource requirements for smaller scales without additional parameters; 2) enhancing any-scale performance in a feature-interweaving fashion, inserting scale pairs into features at regular intervals and ensuring correct feature/scale processing.

Image Super-Resolution

Oracle Bone Inscriptions Multi-modal Dataset

no code implementations4 Jul 2024 Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography.

Decipherment Denoising

HRSAM: Efficient Interactive Segmentation in High-Resolution Images

1 code implementation2 Jul 2024 You Huang, Wenbin Lai, Jiayi Ji, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

Within FLA, we implement Flash Swin attention, achieving over a 35% speedup compared to traditional Swin attention, and propose a KV-only padding mechanism to enhance extrapolation.

Data Augmentation Interactive Segmentation +2

Local Manifold Learning for No-Reference Image Quality Assessment

no code implementations27 Jun 2024 Timin Gao, Wensheng Pan, Yan Zhang, Sicheng Zhao, Shengchuan Zhang, Xiawu Zheng, Ke Li, Liujuan Cao, Rongrong Ji

This crop is then used to cluster other crops from the same image as the positive class, while crops from different images are treated as negative classes to increase inter-class distance.

Contrastive Learning NR-IQA

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

1 code implementation27 Jun 2024 Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum.

Object object-detection +1

UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

1 code implementation26 Jun 2024 Wenhao Li, Mingbao Lin, Yunshan Zhong, Shuicheng Yan, Rongrong Ji

Managing long texts is challenging for large language models (LLMs) due to limited context window sizes.

4k Decoder

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

1 code implementation25 Jun 2024 Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions.

3D Generation Denoising +2

Depth-Guided Semi-Supervised Instance Segmentation

no code implementations25 Jun 2024 Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

Additionally, to manage the variability of depth images during training, we introduce the Depth Controller.

Depth Estimation Instance Segmentation +2

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

1 code implementation24 Jun 2024 Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji

We identify three types of relationship co-occurrences that lead to hallucinations: relationship-relationship, subject-relationship, and relationship-object.

Common Sense Reasoning Hallucination +1

AnyTrans: Translate AnyText in the Image with Large Scale Models

no code implementations17 Jun 2024 Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, Rongrong Ji

This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI), which includes multilingual text translation and text fusion within images.

Few-Shot Learning Translation

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

no code implementations14 Jun 2024 Chenyu Zhou, Mengdan Zhang, Peixian Chen, Chaoyou Fu, Yunhang Shen, Xiawu Zheng, Xing Sun, Rongrong Ji

In support of this task, we further craft a new VEGA dataset, tailored for the IITC task on scientific content, and devised a subtask, Image-Text Association (ITA), to refine image-text correlation skills.

Reading Comprehension

Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval

no code implementations9 Jun 2024 Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji

To address this issue, we propose an effective bi-directional one-to-many embedding paradigm that offers a clear optimization direction for each sample, thus mitigating the optimization problem.

Image-text Retrieval Person Retrieval +3

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

1 code implementation3 Jun 2024 Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

These strategies are designed to extract the most accurate masks from SAM's output, thus guiding the training of the student model with enhanced precision.

Pseudo Label Referring Expression +1

Image Captioning via Dynamic Path Customization

1 code implementation1 Jun 2024 Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji

This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs.

Diversity Image Captioning

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

1 code implementation31 May 2024 Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei LI, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

With Video-MME, we extensively evaluate various state-of-the-art MLLMs, including GPT-4 series and Gemini 1. 5 Pro, as well as open-source image models like InternVL-Chat-V1. 5 and video models like LLaVA-NeXT-Video.

MME Video MME

UniPTS: A Unified Framework for Proficient Post-Training Sparsity

1 code implementation CVPR 2024 Jingjing Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji

Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need.

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

1 code implementation CVPR 2024 You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji

First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object.

Decoder Interactive Segmentation +2

GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

no code implementations27 May 2024 Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussians of Interest using an Optimizable Semantic-space Hyperplane.

3DGS feature selection +3

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

no code implementations16 May 2024 Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

We present Dual3D, a novel text-to-3D generation framework that generates high-quality 3D assets from texts in only $1$ minute. The key component is a dual-mode multi-view latent diffusion model.

3D Generation Denoising +1

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

1 code implementation9 May 2024 Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji

Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation.

GraCo: Granularity-Controllable Interactive Segmentation

1 code implementation CVPR 2024 Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen

In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input.

Interactive Segmentation Segmentation

ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion

1 code implementation26 Apr 2024 Ziyue Zhang, Mingbao Lin, Rongrong Ji

We introduce ObjectAdd, a training-free diffusion modification method to add user-expected objects into user-specified area.

Image Inpainting Object

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

no code implementations24 Apr 2024 Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.

Decision Making Logical Reasoning +1

Multi-Modal Prompt Learning on Blind Image Quality Assessment

1 code implementation23 Apr 2024 Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.

CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

1 code implementation23 Apr 2024 Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i. e., diffusion extrapolation, significantly improves diffusion adaptability.

Denoising

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

1 code implementation17 Apr 2024 Yongdong Luo, Haojia Lin, Xiawu Zheng, Yigeng Jiang, Fei Chao, Jie Hu, Guannan Jiang, Songan Zhang, Rongrong Ji

3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships.

3D dense captioning 3D visual grounding +1

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

1 code implementation CVPR 2024 Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji

Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research.

Diversity Language Modeling +2

Deep Instruction Tuning for Segment Anything Model

1 code implementation31 Mar 2024 Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

Recently, Segment Anything Model (SAM) has become a research hotspot in the fields of multimedia and computer vision, which exhibits powerful yet versatile capabilities on various (un) conditional image segmentation tasks.

Decoder Image Segmentation +3

DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis

1 code implementation27 Mar 2024 Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji

The rapid progress in deep learning has given rise to hyper-realistic facial forgery methods, leading to concerns related to misinformation and security risks.

Image Generation Misinformation

Toward Open-Set Human Object Interaction Detection

1 code implementation Proceedings of the AAAI Conference on Artificial Intelligence 2024 Mingrui Wu, Yuqi Liu, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

To address this challenge, we introduce a simple Disentangled HOI Detection (DHD) model for detecting novel relationships by integrating an open-set object detector with a Visual Language Model (VLM).

Contrastive Learning Human-Object Interaction Detection +3

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

1 code implementation22 Mar 2024 Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS).

Transfer Learning

DMAD: Dual Memory Bank for Real-World Anomaly Detection

no code implementations19 Mar 2024 Jianlong Hu, Xu Chen, Zhenye Gan, Jinlong Peng, Shengchuan Zhang, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Liujuan Cao, Rongrong Ji

To address the challenge of real-world anomaly detection, we propose a new framework named Dual Memory bank enhanced representation learning for Anomaly Detection (DMAD).

Anomaly Detection Representation Learning

AffineQuant: Affine Transformation Quantization for Large Language Models

1 code implementation19 Mar 2024 Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training.

Quantization

Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers

no code implementations15 Mar 2024 Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, Rongrong Ji

Firstly, we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion.

Visual Object Tracking Visual Tracking

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

1 code implementation11 Mar 2024 Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji

Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality.

Face Generation Text to 3D

DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

no code implementations10 Mar 2024 Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji

Aiming to simultaneously generate the object and its matting annotation, we build a matting head to make a green color removal in the latent space of the VAE decoder.

Image Matting Object

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

1 code implementation5 Mar 2024 Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming.

TextVQA Visual Question Answering

Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling

no code implementations23 Feb 2024 Hui Lin, Zhiheng Ma, Rongrong Ji, YaoWei Wang, Zhou Su, Xiaopeng Hong, Deyu Meng

This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled.

Crowd Counting Decoder +1

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

1 code implementation19 Feb 2024 Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji

For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16. 88, surpassing the state-of-the-art DSnoT with a perplexity of 75. 14.

Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration

1 code implementation24 Jan 2024 Yimin Xu, Nanxi Gao, Zhongyun Shan, Fei Chao, Rongrong Ji

In contrast to traditional image restoration methods, all-in-one image restoration techniques are gaining increased attention for their ability to restore images affected by diverse and unknown corruption types and levels.

Computational Efficiency Image Restoration

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

1 code implementation18 Jan 2024 Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen

To mold instance queries to follow Brownian bridge and accomplish alignment with class texts, we design Bridge-Text Alignment (BTA) to learn discriminative bridge-level representations of instances via contrastive objectives.

Instance Segmentation Semantic Segmentation +1

Learning Image Demoireing from Unpaired Real Data

1 code implementation5 Jan 2024 Yunshan Zhong, Yuyao Zhou, Yuxin Zhang, Fei Chao, Rongrong Ji

The proposed method, referred to as Unpaired Demoireing (UnDeM), synthesizes pseudo moire images from unpaired datasets, generating pairs with clean images for training demoireing models.

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

no code implementations CVPR 2024 Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, Rongrong Ji

Firstly we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion.

Visual Tracking

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

1 code implementation CVPR 2024 Sihan Liu, Yiwei Ma, Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing, delineating specific regions in aerial images as described by textual queries.

ARC Image Segmentation +2

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

1 code implementation11 Dec 2023 Tao Chen, Enwei Zhang, Yuting Gao, Ke Li, Xing Sun, Yan Zhang, Hui Li, Rongrong Ji

Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks.

In-Context Learning

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

no code implementations11 Dec 2023 Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji

Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation.

Contrastive Learning feature selection +1

Aligning and Prompting Everything All at Once for Universal Visual Perception

2 code implementations CVPR 2024 Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji

However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.

Object object-detection +6

Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment

no code implementations1 Dec 2023 Xudong Li, Jingyuan Zheng, Xiawu Zheng, Runze Hu, Enwei Zhang, Yuting Gao, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Yan Zhang, Rongrong Ji

Concretely, by innovatively introducing a novel feature distillation method in IQA, we propose a new framework to learn comparative knowledge from non-aligned reference images.

Inductive Bias NR-IQA

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

1 code implementation30 Nov 2023 Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji

Nevertheless, a substantial domain gap exists between 2D images and 3D assets, primarily attributed to variations in camera-related attributes and the exclusive presence of foreground objects.

3D Generation Text to 3D

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

1 code implementation16 Nov 2023 Yunshan Zhong, Jiawei Hu, Mengzhao Chen, Rongrong Ji

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications.

Quantization

Semi-Supervised Panoptic Narrative Grounding

1 code implementation27 Oct 2023 Danni Yang, Jiayi Ji, Xiaoshuai Sun, Haowei Wang, Yinan Li, Yiwei Ma, Rongrong Ji

Remarkably, our SS-PNG-NW+ outperforms fully-supervised models with only 30% and 50% supervision data, exceeding their performance by 0. 8% and 1. 1% respectively.

Data Augmentation Pseudo Label

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning

1 code implementation17 Oct 2023 Haowei Wang, Jiayi Ji, Tianyu Guo, Yilong Yang, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

To address this, we introduce two cascading modules based on the barycenter of the mask, which are Coordinate Guided Aggregation (CGA) and Barycenter Driven Localization (BDL), responsible for segmentation and detection, respectively.

Segmentation Visual Grounding

JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues

1 code implementation14 Oct 2023 Jiayi Ji, Haowei Wang, Changli Wu, Yiwei Ma, Xiaoshuai Sun, Rongrong Ji

The rising importance of 3D understanding, pivotal in computer vision, autonomous driving, and robotics, is evident.

Autonomous Driving Representation Learning

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

1 code implementation13 Oct 2023 Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji

Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs.

Network Pruning

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

1 code implementation ICCV 2023 Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, Rongrong Ji

Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training.

Image Generation single-image-generation

Towards Unified Token Learning for Vision-Language Tracking

1 code implementation27 Aug 2023 Yaozong Zheng, Bineng Zhong, Qihua Liang, Guorong Li, Rongrong Ji, Xianxian Li

In this paper, we present a simple, flexible and effective vision-language (VL) tracking pipeline, termed \textbf{MMTrack}, which casts VL tracking as a token generation task.

DLIP: Distilling Language-Image Pre-training

no code implementations24 Aug 2023 Huafeng Kuang, Jie Wu, Xiawu Zheng, Ming Li, Xuefeng Xiao, Rui Wang, Min Zheng, Rongrong Ji

Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22. 4% parameters and 24. 8% FLOPs compared to the teacher model and accelerates inference speed by 2. 7x.

Image Captioning Image-text Retrieval +5

A Unified Framework for 3D Point Cloud Visual Grounding

1 code implementation23 Aug 2023 Haojia Lin, Yongdong Luo, Xiawu Zheng, Lijiang Li, Fei Chao, Taisong Jin, Donghao Luo, Yan Wang, Liujuan Cao, Rongrong Ji

This elaborate design enables 3DRefTR to achieve both well-performing 3DRES and 3DREC capacities with only a 6% additional latency compared to the original 3DREC model.

Referring Expression Referring Expression Comprehension +1

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

1 code implementation22 Aug 2023 Tao Chen, Ze Lin, Hui Li, Jiayi Ji, Yiyi Zhou, Guanbin Li, Rongrong Ji

Furthermore, we model product attributes based on both text and image modalities so that multi-modal product characteristics can be manifested in the generated summaries.

Attribute

HODN: Disentangling Human-Object Feature for HOI Detection

no code implementations20 Aug 2023 Shuman Fang, Zhiwen Lin, Ke Yan, Jie Li, Xianming Lin, Rongrong Ji

However, these methods ignore the relationship among humans, objects, and interactions: 1) human features are more contributive than object ones to interaction prediction; 2) interactive information disturbs the detection of objects but helps human detection.

Decoder Human Detection +4

Continual Face Forgery Detection via Historical Distribution Preserving

no code implementations11 Aug 2023 Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

In this paper, we focus on a novel and challenging problem: Continual Face Forgery Detection (CFFD), which aims to efficiently learn from new forgery attacks without forgetting previous ones.

Knowledge Distillation

Pseudo-label Alignment for Semi-supervised Instance Segmentation

1 code implementation ICCV 2023 Jie Hu, Chen Chen, Liujuan Cao, Shengchuan Zhang, Annan Shu, Guannan Jiang, Rongrong Ji

Through extensive experiments conducted on the COCO and Cityscapes datasets, we demonstrate that PAIS is a promising framework for semi-supervised instance segmentation, particularly in cases where labeled data is severely limited.

Instance Segmentation Pseudo Label +3

Improving Human-Object Interaction Detection via Virtual Image Learning

no code implementations4 Aug 2023 Shuman Fang, Shuai Liu, Jie Li, Guannan Jiang, Xianming Lin, Rongrong Ji

Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects, which plays a curtail role in high-level semantic understanding tasks.

Human-Object Interaction Detection Object

Towards General Visual-Linguistic Face Forgery Detection

no code implementations31 Jul 2023 Ke Sun, Shen Chen, Taiping Yao, Haozhe Yang, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

To address this issues, in this paper, we propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.

Binary Classification DeepFake Detection +2

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

1 code implementation30 Jun 2023 Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, DaCheng Tao

Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight.

Approximated Prompt Tuning for Vision-Language Pre-trained Models

no code implementations27 Jun 2023 Qiong Wu, Shubin Huang, Yiyi Zhou, Pingyang Dai, Annan Shu, Guannan Jiang, Rongrong Ji

Prompt tuning is a parameter-efficient way to deploy large-scale pre-trained models to downstream tasks by adding task-specific tokens.

Image Classification Text-to-Image Generation +1

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

4 code implementations23 Jun 2023 Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.

Benchmarking Language Modeling +6

Spatial Re-parameterization for N:M Sparsity

1 code implementation9 Jun 2023 Yuxin Zhang, Mingliang Xu, Yonghong Tian, Rongrong Ji

This paper presents a Spatial Re-parameterization (SpRe) method for the N:M sparsity in CNNs.

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection

1 code implementation CVPR 2023 Zhenglin Zhou, Huaxia Li, Hong Liu, Nanyang Wang, Gang Yu, Rongrong Ji

To solve this problem, we propose a Self-adapTive Ambiguity Reduction (STAR) loss by exploiting the properties of semantic ambiguity.

Face Alignment Facial Landmark Detection

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting

1 code implementation1 Jun 2023 Shubin Huang, Qiong Wu, Yiyi Zhou, WeiJie Chen, Rongsheng Zhang, Xiaoshuai Sun, Rongrong Ji

In addition, we also experiment DVP with the recently popular adapter approach to keep the most parameters of PLMs intact when adapting to VL tasks, helping PLMs achieve a quick shift between single- and multi-modal tasks.

Transfer Learning Visual Prompting

CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models

1 code implementation29 May 2023 Zhongxi Chen, Ke Sun, Xianming Lin, Rongrong Ji

Due to the stochastic sampling process of diffusion, our model is capable of sampling multiple possible predictions from the mask distribution, avoiding the problem of overconfident point estimation.

Denoising Object +3

MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

2 code implementations14 May 2023 Yunshan Zhong, Yuyao Zhou, Fei Chao, Rongrong Ji

However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance.

Quantization

Distribution-Flexible Subset Quantization for Post-Quantizing Super-Resolution Networks

1 code implementation10 May 2023 Yunshan Zhong, Mingbao Lin, Jingjing Xie, Yuxin Zhang, Fei Chao, Rongrong Ji

Compared to the common iterative exhaustive search algorithm, our strategy avoids the enumeration of all possible combinations in the universal set, reducing the time complexity from exponential to linear.

Quantization Super-Resolution