Search Results for author: WangMeng Zuo

Found 266 papers, 175 papers with code

Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation

1 code implementation15 Oct 2024 Qizhang Li, Xiaochen Yang, WangMeng Zuo, Yiwen Guo

Our method also achieves over 90% attack success rates against Llama-2-Chat models on AdvBench, despite their outstanding resistance to jailbreak attacks.

Combining Generative and Geometry Priors for Wide-Angle Portrait Correction

1 code implementation13 Oct 2024 Lan Yao, Chaofeng Chen, Xiaoming Li, Zifei Yan, WangMeng Zuo

In this work, we propose encapsulating the generative face prior as a guided natural manifold to facilitate the correction of facial regions.

VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models

1 code implementation2 Oct 2024 Kailai Feng, Yabo Zhang, Haodong Yu, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, WangMeng Zuo

Artistic typography is a technique to visualize the meaning of input character in an imaginable and readable manner.

Reblurring-Guided Single Image Defocus Deblurring: A Learning Framework with Misaligned Training Pairs

no code implementations26 Sep 2024 Xinya Shu, Yu Li, Dongwei Ren, Xiaohe Wu, Jin Li, WangMeng Zuo

Then, to effectively learn the baseline defocus deblurring network with misaligned training pairs, our reblurring module ensures spatial consistency between the deblurred image, the reblurred image and the input blurry image by reconstructing spatially variant isotropic blur kernels.

Deblurring Image Defocus Deblurring

LPT++: Efficient Training on Mixture of Long-tailed Experts

no code implementations17 Sep 2024 Bowen Dong, Pan Zhou, WangMeng Zuo

We introduce LPT++, a comprehensive framework for long-tailed classification that combines parameter-efficient fine-tuning (PEFT) with a learnable model ensemble.

parameter-efficient fine-tuning

SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

1 code implementation25 Aug 2024 Wenrui Li, Fucheng Cai, Yapeng Mi, Zhe Yang, WangMeng Zuo, Xingtao Wang, Xiaopeng Fan

Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation and employs 3D Gaussian Splatting (3DGS) to ensure consistency across multi-view panoramic images.

Image Generation Scene Generation

SelfDRSC++: Self-Supervised Learning for Dual Reversed Rolling Shutter Correction

1 code implementation21 Aug 2024 Wei Shang, Dongwei Ren, Wanying Zhang, Qilong Wang, Pengfei Zhu, WangMeng Zuo

Subsequently, to effectively train the DRSC network, we propose a self-supervised learning strategy that ensures cycle consistency between input and reconstructed dual reversed RS images.

Rolling Shutter Correction Self-Supervised Learning +1

AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

no code implementations21 Aug 2024 Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, WangMeng Zuo, Nan Duan

With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism.

Scheduling

Thin-Plate Spline-based Interpolation for Animation Line Inbetweening

1 code implementation17 Aug 2024 Tianyi Zhu, Wei Shang, Dongwei Ren, WangMeng Zuo

Motivated by this observation, we propose a simple yet effective interpolation method for animation line inbetweening that adopts thin-plate spline-based transformation to estimate coarse motion more accurately by modeling the keypoint correspondence between two key frames, particularly for large motion scenarios.

Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors

1 code implementation13 Jul 2024 Wei Shang, Dongwei Ren, Wanying Zhang, Yuming Fang, WangMeng Zuo, Kede Ma

Arbitrary-scale video super-resolution (AVSR) aims to enhance the resolution of video frames, potentially at various scaling factors, which presents several challenges regarding spatial detail reproduction, temporal consistency, and computational complexity.

Video Super-Resolution

Multi-modal Crowd Counting via a Broker Modality

1 code implementation10 Jul 2024 Haoliang Meng, Xiaopeng Hong, Chenhao Wang, Miao Shang, WangMeng Zuo

Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images.

Crowd Counting Denoising

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

1 code implementation1 Jul 2024 Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, WangMeng Zuo, Qixiang Ye, Jingdong Wang

For this purpose, we establish a new benchmark comprising text prompts that fully reflect multiple dynamics grades, and define a set of dynamics scores corresponding to various temporal granularities to comprehensively evaluate the dynamics of each generated video.

Text-to-Video Generation Video Generation

LayerMatch: Do Pseudo-labels Benefit All Layers?

no code implementations20 Jun 2024 Chaoqi Liang, Guanglei Yang, Lifeng Qiao, Zitong Huang, Hongliang Yan, Yunchao Wei, WangMeng Zuo

Our approach, LayerMatch, which integrates these two strategies, can avoid the severe interference of noisy pseudo-labels in the linear classification layer while accelerating the clustering capability of the feature extraction layer.

Avg Clustering +1

Diffusion Models in Low-Level Vision: A Survey

1 code implementation17 Jun 2024 Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, WangMeng Zuo, Zhenhua Guo, Xiu Li

Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities.

Denoising Survey

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

1 code implementation11 Jun 2024 Hang Yao, Ming Liu, Haolin Wang, Zhicun Yin, Zifei Yan, Xiaopeng Hong, WangMeng Zuo

Therefore, instead of utilizing the same setting for all samples, we propose to predict a particular denoising step for each sample by evaluating the difference between image contents and the priors extracted from diffusion models.

Denoising Unsupervised Anomaly Detection

DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

1 code implementation3 Jun 2024 Tianyu Huang, Haoze Zhang, Yihan Zeng, Zhilu Zhang, Hui Li, WangMeng Zuo, Rynson W. H. Lau

In this work, combining the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions.

Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning

1 code implementation30 May 2024 Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo

Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor.

Improved Generation of Adversarial Examples Against Safety-aligned LLMs

1 code implementation28 May 2024 Qizhang Li, Yiwen Guo, WangMeng Zuo, Hao Chen

In addition, without introducing obvious cost, the combination achieves >30% absolute increase in attack success rates compared with GCG when generating both query-specific (38% -> 68%) and universal adversarial prompts (26. 68% -> 60. 32%) for attacking the Llama-2-7B-Chat model on AdvBench.

Image Classification

Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets

no code implementations14 May 2024 Wei Lian, Zhesen Cui, Fei Ma, Hang Pan, WangMeng Zuo

In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations.

MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation

1 code implementation9 May 2024 Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, WangMeng Zuo

In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.

Text-to-Image Generation

Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

2 code implementations3 May 2024 Zhilu Zhang, Ruohao Wang, Hongzhi Zhang, WangMeng Zuo

In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images.

Optical Flow Estimation Reference-based Super-Resolution +1

IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks

no code implementations25 Apr 2024 Zitong Huang, Ze Chen, Bowen Dong, Chaoqi Liang, Erjin Zhou, WangMeng Zuo

Model Weight Averaging (MWA) is a technique that seeks to enhance model's performance by averaging the weights of multiple trained models.

Image Classification object-detection +2

NIR-Assisted Image Denoising: A Selective Fusion Approach and A Real-World Benchmark Dataset

1 code implementation12 Apr 2024 Rongjian Xu, Zhilu Zhang, Renlong Wu, WangMeng Zuo

Despite the significant progress in image denoising, it is still challenging to restore fine-scale details while removing noise, especially in extremely low-light environments.

Image Denoising

TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

1 code implementation11 Apr 2024 Junyi Li, Zhilu Zhang, WangMeng Zuo

For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures.

Computational Efficiency Image Denoising +2

SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

1 code implementation9 Apr 2024 Xiaoyu Liu, Yuxiang Wei, Ming Liu, Xianhui Lin, Peiran Ren, Xuansong Xie, WangMeng Zuo

The key idea of our SmartControl is to relax the visual condition on the areas that are conflicted with text prompts.

MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation

1 code implementation8 Apr 2024 Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, WangMeng Zuo

Customized text-to-image generation aims to synthesize instantiations of user-specified concepts and has achieved unprecedented progress in handling individual concept.

Text-to-Image Generation

Responsible Visual Editing

1 code implementation8 Apr 2024 Minheng Ni, Yeli Shen, Lei Zhang, WangMeng Zuo

To mitigate the negative implications of harmful images on research, we create a transparent and public dataset, AltBear, which expresses harmful information using teddy bears instead of humans.

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

no code implementations7 Apr 2024 Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, WangMeng Zuo

Specifically, we propose a shoe-wearing system, called Shoe-Model, to generate plausible images of human legs interacting with the given shoes.

Image Generation Marketing

Dual-Camera Smooth Zoom on Mobile Phones

1 code implementation7 Apr 2024 Renlong Wu, Zhilu Zhang, Yu Yang, WangMeng Zuo

In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview.

Self-Supervised Video Desmoking for Laparoscopic Surgery

1 code implementation17 Mar 2024 Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, WangMeng Zuo

On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions.

Learning Hierarchical Color Guidance for Depth Map Super-Resolution

no code implementations12 Mar 2024 Runmin Cong, Ronghui Sheng, Hao Wu, Yulan Guo, Yunchao Wei, WangMeng Zuo, Yao Zhao, Sam Kwong

On the one hand, the low-level detail embedding module is designed to supplement high-frequency color information of depth features in a residual mask manner at the low-level stages.

Depth Map Super-Resolution

A self-supervised CNN for image watermark removal

1 code implementation9 Mar 2024 Chunwei Tian, Menghua Zheng, Tiancai Jiao, WangMeng Zuo, Yanning Zhang, Chia-Wen Lin

Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal.

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

1 code implementation8 Mar 2024 Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, WangMeng Zuo

Different from conventional T2V sampling (i. e., temporal and spatial modeling), VideoElevator explicitly decomposes each sampling step into temporal motion refining and spatial quality elevating.

Video Generation

PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

1 code implementation CVPR 2024 Zhengyao Lv, Yuxiang Wei, WangMeng Zuo, Kwan-Yee K. Wong

Extensive experiments demonstrate that our approach performs favorably in terms of visual quality, semantic consistency, and layout alignment.

Image Generation

ConSept: Continual Semantic Segmentation via Adapter-based Vision Transformer

no code implementations26 Feb 2024 Bowen Dong, Guanglei Yang, WangMeng Zuo, Lei Zhang

Empirical investigations on the adaptation of existing frameworks to vanilla ViT reveal that incorporating visual adapters into ViTs or fine-tuning ViTs with distillation terms is advantageous for enhancing the segmentation capability of novel classes.

Continual Semantic Segmentation Segmentation +1

A Comprehensive Survey on 3D Content Generation

1 code implementation2 Feb 2024 Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, WangMeng Zuo, Junjun Jiang, Xianming Liu

Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e. g., text, image, video, audio and 3D.

Survey

Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning

1 code implementation3 Jan 2024 Zitong Huang, Ze Chen, Zhixing Chen, Erjin Zhou, Xinxing Xu, Rick Siow Mong Goh, Yong liu, WangMeng Zuo, ChunMei Feng

When progressing to a new session, pseudo-features are sampled from old-class distributions combined with training images of the current session to optimize the prompt, thus enabling the model to learn new knowledge while retaining old knowledge.

class-incremental learning Few-Shot Class-Incremental Learning +2

Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks

2 code implementations1 Jan 2024 Zhilu Zhang, Shuohao Zhang, Renlong Wu, Zifei Yan, WangMeng Zuo

It is highly desired but challenging to acquire high-quality photos with clear content in low-light environments.

Deblurring Denoising +2

Improving Image Restoration through Removing Degradations in Textual Representations

1 code implementation CVPR 2024 Jingbo Lin, Zhilu Zhang, Yuxiang Wei, Dongwei Ren, Dongsheng Jiang, WangMeng Zuo

To address the cross-modal assistance, we propose to map the degraded images into textual representations for removing the degradations, and then convert the restored textual representations into a guidance image for assisting image restoration.

Deblurring Denoising +3

FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models

1 code implementation28 Dec 2023 Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, WangMeng Zuo

Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.

class-incremental learning Dimensionality Reduction +3

Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation

1 code implementation26 Dec 2023 Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo

Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting large vision-language models to specific tasks or scenarios.

parameter-efficient fine-tuning

Decoupled Textual Embeddings for Customized Image Generation

1 code implementation19 Dec 2023 Yufei Cai, Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hu Han, WangMeng Zuo

To decouple irrelevant attributes (i. e., background and pose) from the subject embedding, we further present several attribute mappers that encode each image as several image-specific subject-unrelated embeddings.

Attribute Disentanglement +2

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

1 code implementation19 Dec 2023 Chun-Mei Feng, Yang Bai, Tao Luo, Zhen Li, Salman Khan, WangMeng Zuo, Xinxing Xu, Rick Siow Mong Goh, Yong liu

By feeding the retrieved image and question to the VQA model, one can find the images inconsistent with relative caption when the answer by VQA is inconsistent with the answer in the QA pair.

Image Retrieval Question Answering +2

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

1 code implementation CVPR 2024 Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo

The priors are then regarded as input conditions to maintain reasonable geometries, in which conditional LoRA and weighted score are further proposed to optimize detailed textures.

3D Generation Text to 3D

Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning

no code implementations24 Oct 2023 Qing Miao, Xiaohe Wu, Chao Xu, Yanli Ji, WangMeng Zuo, Yiwen Guo, Zhaopeng Meng

By incorporating auxiliary information from CLIP and utilizing prompt fine-tuning, we effectively eliminate noisy samples from the clean set and mitigate confirmation bias during training.

Learning with noisy labels

Learning Real-World Image De-Weathering with Imperfect Supervision

1 code implementation23 Oct 2023 Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Chaoyu Feng, Xiaotao Wang, Lei Lei, WangMeng Zuo

Real-world image de-weathering aims at removing various undesirable weather-related artifacts.

Pseudo Label

A cross Transformer for image denoising

1 code implementation16 Oct 2023 Chunwei Tian, Menghua Zheng, WangMeng Zuo, Shichao Zhang, Yanning Zhang, Chia-Wen Ling

To avoid loss of key information, PB uses three heterogeneous networks to implement multiple interactions of multi-level features to broadly search for extra information for improving the adaptability of an obtained denoiser for complex scenes.

Image Denoising

DualAug: Exploiting Additional Heavy Augmentation with OOD Data Rejection

1 code implementation12 Oct 2023 Zehao Wang, Yiwen Guo, Qizhang Li, Guanglei Yang, WangMeng Zuo

Most existing data augmentation methods tend to find a compromise in augmenting the data, \textit{i. e.}, increasing the amplitude of augmentation carefully to avoid degrading some data too much and doing harm to the model performance.

Data Augmentation Image Classification +1

Toward Understanding BERT-Like Pre-Training for DNA Foundation Models

no code implementations11 Oct 2023 Chaoqi Liang, Lifeng Qiao, Peng Ye, Nanqing Dong, Jianle Sun, Weiqiang Bai, Yuchen Ren, Xinzhu Ma, Hongliang Yan, Chunfeng Song, Wanli Ouyang, WangMeng Zuo

However, existing pre-training methods for DNA sequences largely rely on direct adoptions of BERT pre-training from NLP, lacking a comprehensive understanding and a specifically tailored approach.

Sentence-level Prompts Benefit Composed Image Retrieval

1 code implementation9 Oct 2023 Yang Bai, Xinxing Xu, Yong liu, Salman Khan, Fahad Khan, WangMeng Zuo, Rick Siow Mong Goh, Chun-Mei Feng

Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption.

Ranked #2 on Image Retrieval on Fashion IQ (using extra training data)

Attribute Composed Image Retrieval (CoIR) +2

Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes

1 code implementation3 Oct 2023 Zhilu Zhang, Haoyu Wang, Shuai Liu, Xiaotao Wang, Lei Lei, WangMeng Zuo

The color component is estimated from aligned multi-exposure images, while the structure one is generated through a structure-focused network that is supervised by the color component and an input reference (\eg, medium-exposure) image.

HDR Reconstruction

Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image Composition

1 code implementation ICCV 2023 Xiaoyu Liu, Ming Liu, Junyi Li, Shuai Liu, Xiaotao Wang, Lei Lei, WangMeng Zuo

In this paper, we circumvent this issue by presenting a joint framework for both unbounded recommendation of camera view and image composition (i. e., UNIC).

Image Cropping

MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces

1 code implementation ICCV 2023 Zhicun Yin, Ming Liu, Xiaoming Li, Hui Yang, Longan Xiao, WangMeng Zuo

To evaluate our proposed MetaF2N, we have collected a real-world low-quality dataset with one or multiple faces in each image, and our MetaF2N achieves superior performance on both synthetic and real-world datasets.

Image Generation Image Super-Resolution +1

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

1 code implementation13 Sep 2023 Dongwei Ren, Wei Shang, Yi Yang, WangMeng Zuo

To aggregate long-term sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability.

Deblurring Video Deblurring

Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video Restoration

no code implementations4 Sep 2023 Yuanshuo Cheng, Mingwen Shao, Yecong Wan, Yuanjian Qiao, WangMeng Zuo, Deyu Meng

To empower the framework for eliminating diverse degradations, we devise a Sequence-wise Adaptive Degradation Estimator (SADE) to estimate degradation features for the input corrupted video.

Video Restoration

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

no code implementations31 Aug 2023 Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, WangMeng Zuo

In this work, we introduce a novel Referring Diffusional segmentor (Ref-Diff) for this task, which leverages the fine-grained multi-modal information from generative models.

Image Segmentation Instance Segmentation +2

VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization

1 code implementation27 Aug 2023 Mingshuai Yao, Yabo Zhang, Xianhui Lin, Xiaoming Li, WangMeng Zuo

In this paper, we propose a VQGAN-based framework (i. e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.

Font Generation Quantization

UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

1 code implementation21 Aug 2023 Jian Zou, Tianyu Huang, Guanglei Yang, Zhenhua Guo, Tao Luo, Chun-Mei Feng, WangMeng Zuo

First, it projects the features from both modalities into a cohesive 3D volume space to intricately marry the bird's eye view (BEV) with the height dimension.

3D Object Detection Autonomous Driving +1

Rethinking Client Drift in Federated Learning: A Logit Perspective

no code implementations20 Aug 2023 Yunlu Yan, Chun-Mei Feng, Mang Ye, WangMeng Zuo, Ping Li, Rick Siow Mong Goh, Lei Zhu, C. L. Philip Chen

Concretely, FedCSD introduces a class prototype similarity distillation to align the local logits with the refined global logits that are weighted by the similarity between local logits and the global prototype.

Federated Learning

Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning

1 code implementation ICCV 2023 Chun-Mei Feng, Kai Yu, Yong liu, Salman Khan, WangMeng Zuo

In this paper, we focus on a particular setting of learning adaptive prompts on the fly for each test sample from an unseen new domain, which is known as test-time prompt tuning (TPT).

Data Augmentation

Latent Code Augmentation Based on Stable Diffusion for Data-free Substitute Attacks

1 code implementation24 Jul 2023 Mingwen Shao, Lingzhuang Meng, Yuanjian Qiao, Lixu Zhang, WangMeng Zuo

Specifically, we augment the latent codes of the inferred member data with LCA and use them as guidance for SD.

Improving Transferability of Adversarial Examples via Bayesian Attacks

no code implementations21 Jul 2023 Qizhang Li, Yiwen Guo, Xiaochen Yang, WangMeng Zuo, Hao Chen

Our ICLR work advocated for enhancing transferability in adversarial examples by incorporating a Bayesian formulation into model parameters, which effectively emulates the ensemble of infinitely many deep neural networks, while, in this paper, we introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters.

RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution

1 code implementation30 Jun 2023 Renlong Wu, Zhilu Zhang, Shuohao Zhang, Hongzhi Zhang, WangMeng Zuo

The main challenge of BurstSR is to effectively combine the complementary information from input frames, while existing methods still struggle with it.

Super-Resolution

Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

1 code implementation CVPR 2023 Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, WangMeng Zuo

Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map.

Image Generation Object

Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive

2 code implementations ICCV 2023 Wei Shang, Dongwei Ren, Chaoyu Feng, Xiaotao Wang, Lei Lei, WangMeng Zuo

In this paper, we propose a Self-supervised learning framework for Dual reversed RS distortions Correction (SelfDRSC), where a DRSC network can be learned to generate a high framerate GS video only based on dual RS images with reversed distortions.

Self-Supervised Learning

ControlVideo: Training-free Controllable Text-to-Video Generation

1 code implementation22 May 2023 Yabo Zhang, Yuxiang Wei, Dongsheng Jiang, Xiaopeng Zhang, WangMeng Zuo, Qi Tian

Text-driven diffusion models have unlocked unprecedented abilities in image generation, whereas their video counterpart still lags behind due to the excessive training cost of temporal modeling.

Image Generation Text-to-Video Generation +1

Improving Adversarial Transferability via Intermediate-level Perturbation Decay

2 code implementations NeurIPS 2023 Qizhang Li, Yiwen Guo, WangMeng Zuo, Hao Chen

In particular, the proposed method, named intermediate-level perturbation decay (ILPD), encourages the intermediate-level perturbation to be in an effective adversarial direction and to possess a great magnitude simultaneously.

Learning Federated Visual Prompt in Null Space for MRI Reconstruction

1 code implementation CVPR 2023 Chun-Mei Feng, Bangjun Li, Xinxing Xu, Yong liu, Huazhu Fu, WangMeng Zuo

Federated Magnetic Resonance Imaging (MRI) reconstruction enables multiple hospitals to collaborate distributedly without aggregating local data, thereby protecting patient privacy.

MRI Reconstruction

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time

2 code implementations CVPR 2023 Wei Shang, Dongwei Ren, Yi Yang, Hongzhi Zhang, Kede Ma, WangMeng Zuo

Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1. 5 dB in terms of PSNR.

Contrastive Learning Deblurring +2

Learning Generative Structure Prior for Blind Text Image Super-resolution

1 code implementation CVPR 2023 Xiaoming Li, WangMeng Zuo, Chen Change Loy

To restrict the generative space of StyleGAN so that it obeys the structure of characters yet remains flexible in handling different font styles, we store the discrete features for each character in a codebook.

Image Super-Resolution

Towards Universal Vision-language Omni-supervised Segmentation

no code implementations12 Mar 2023 Bowen Dong, Jiaxi Gu, Jianhua Han, Hang Xu, WangMeng Zuo

To improve the open-world segmentation ability, we leverage omni-supervised data (i. e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability and achieving better segmentation accuracy.

Instance Segmentation object-detection +4

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

1 code implementation ICCV 2023 Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, WangMeng Zuo

In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation.

Text-to-Image Generation

Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples

1 code implementation10 Feb 2023 Qizhang Li, Yiwen Guo, WangMeng Zuo, Hao Chen

In this paper, by contrast, we opt for the diversity in substitute models and advocate to attack a Bayesian model for achieving desirable transferability.

Diversity

NUWA-LIP: Language-Guided Image Inpainting With Defect-Free VQGAN

no code implementations CVPR 2023 Minheng Ni, Xiaoming Li, WangMeng Zuo

Language-guided image inpainting aims to fill the defective regions of an image under the guidance of text while keeping the non-defective regions unchanged.

Image Inpainting

Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography

1 code implementation CVPR 2023 Yue Cao, Ming Liu, Shuai Liu, Xiaotao Wang, Lei Lei, WangMeng Zuo

Although deep neural networks have achieved astonishing performance in many vision tasks, existing learning-based methods are far inferior to the physical model-based solutions in extreme low-light sensor noise modeling.

Image Denoising

Position-Aware Contrastive Alignment for Referring Image Segmentation

no code implementations27 Dec 2022 Bo Chen, Zhiwei Hu, Zhilong Ji, Jinfeng Bai, WangMeng Zuo

The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image.

Image Segmentation Position +1

Human Co-Parsing Guided Alignment for Occluded Person Re-identification

1 code implementation IEEE Transactions on Image Processing 2022 Shuguang Dou, Cairong Zhao, Xinyang Jiang, Shanshan Zhang, Wei-Shi Zheng, WangMeng Zuo

Most supervised methods propose to train an extra human parsing model aside from the ReID model with cross-domain human parts annotation, suffering from expensive annotation cost and domain gap; Unsupervised methods integrate a feature clustering-based human parsing process into the ReID model, but lacking supervision signals brings less satisfactory segmentation results.

Human Parsing Occluded Person Re-Identification

HS-Diffusion: Semantic-Mixing Diffusion for Head Swapping

1 code implementation13 Dec 2022 Qinghe Wang, Lijie Liu, Miao Hua, Pengfei Zhu, WangMeng Zuo, QinGhua Hu, Huchuan Lu, Bing Cao

We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping.

Benchmark Dataset and Effective Inter-Frame Alignment for Real-World Video Super-Resolution

1 code implementation10 Dec 2022 Ruohao Wang, Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Chun-Mei Feng, Lei Zhang, WangMeng Zuo

On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results.

Optical Flow Estimation Video Super-Resolution

Relationship Quantification of Image Degradations

no code implementations8 Dec 2022 Wenxin Wang, Boyun Li, Yuanbiao Gou, Peng Hu, WangMeng Zuo, Xi Peng

To tackle the first challenge, we proposed a Degradation Relationship Index (DRI) which is defined as the mean drop rate difference in the validation loss between two models which are respectively trained using the anchor degradation and the mixture of the anchor and the auxiliary degradations.

Denoising Image Dehazing +2

Learning Single Image Defocus Deblurring with Misaligned Training Pairs

2 code implementations26 Nov 2022 Yu Li, Dongwei Ren, Xinya Shu, WangMeng Zuo

First, in the deblurring module, a bi-directional optical flow-based deformation is introduced to tolerate spatial misalignment between deblurred and ground-truth images.

Deblurring Image Defocus Deblurring +1

Self-Supervised Image Restoration with Blurry and Noisy Pairs

1 code implementation14 Nov 2022 Zhilu Zhang, Rongjian Xu, Ming Liu, Zifei Yan, WangMeng Zuo

By learning in a collaborative manner, the deblurring and denoising tasks in our method can benefit each other.

Deblurring Denoising +1

Learning Dual Memory Dictionaries for Blind Face Restoration

1 code implementation15 Oct 2022 Xiaoming Li, Shiguang Zhang, Shangchen Zhou, Lei Zhang, WangMeng Zuo

Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model.

Blind Face Restoration

ImaginaryNet: Learning Object Detectors without Real Images and Annotations

1 code implementation13 Oct 2022 Minheng Ni, Zitong Huang, Kailai Feng, WangMeng Zuo

Given a class label, the language model is used to generate a full description of a scene with a target object, and the text-to-image model deployed to generate a photo-realistic image.

Image Generation Language Modelling +3

From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution

1 code implementation3 Oct 2022 Xiaoming Li, Chaofeng Chen, Xianhui Lin, WangMeng Zuo, Lei Zhang

Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors.

Image Generation Image Super-Resolution

LPT: Long-tailed Prompt Tuning for Image Classification

1 code implementation3 Oct 2022 Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo

For better effectiveness, we divide prompts into two groups: 1) a shared prompt for the whole long-tailed dataset to learn general features and to adapt a pretrained model into target domain; and 2) group-specific prompts to gather group-specific features for the samples which have similar features and also to empower the pretrained model with discrimination ability.

 Ranked #1 on Long-tail Learning on CIFAR-100-LT (ρ=100) (using extra training data)

Classification Image Classification +1

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training

1 code implementation ICCV 2023 Tianyu Huang, Bowen Dong, Yunhan Yang, Xiaoshui Huang, Rynson W. H. Lau, Wanli Ouyang, WangMeng Zuo

To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification.

Contrastive Learning Few-Shot Learning +5

Multi-stage image denoising with the wavelet transform

1 code implementation26 Sep 2022 Chunwei Tian, Menghua Zheng, WangMeng Zuo, Bob Zhang, Yanning Zhang, David Zhang

In this paper, we propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i. e., a dynamic convolutional block (DCB), two cascaded wavelet transform and enhancement blocks (WEBs) and a residual block (RB).

Image Denoising

A heterogeneous group CNN for image super-resolution

1 code implementation26 Sep 2022 Chunwei Tian, Yanning Zhang, WangMeng Zuo, Chia-Wen Lin, David Zhang, Yixuan Yuan

To prevent loss of original information, a multi-level enhancement mechanism guides a CNN to achieve a symmetric architecture for promoting expressive ability of HGSRCNN.

Image Super-Resolution

Learning Hierarchical Dynamics with Spatial Adjacency for Image Enhancement

1 code implementation ACMMM 2022 Yudong Liang, Bin Wang, Wenqi Ren, Jiaying Liu, Wenjian Wang, WangMeng Zuo

In various real-world image enhancement applications, the degradations are always non-uniform or non-homogeneous and diverse, which challenges most deep networks with fixed parameters during the inference phase.

Image Dehazing Low-Light Image Enhancement +1

Two-Stream Networks for Object Segmentation in Videos

no code implementations8 Aug 2022 Hannan Lu, Zhi Tian, Lirong Yang, Haibing Ren, WangMeng Zuo

The compact instance stream effectively improves the segmentation accuracy of the unseen pixels, while fusing two streams with the adaptive routing map leads to an overall performance boost.

Object Retrieval +5

W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection

1 code implementation25 Jul 2022 Zitong Huang, Yiping Bao, Bowen Dong, Erjin Zhou, WangMeng Zuo

Generally, with given pseudo ground-truths generated from the well-trained WSOD network, we propose a two-module iterative training algorithm to refine pseudo labels and supervise better object detector progressively.

Object object-detection +2

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

1 code implementation21 Jul 2022 Ming Liu, Yuxiang Wei, Xiaohe Wu, WangMeng Zuo, Lei Zhang

Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality.

Image Generation Image Restoration

Towards Diverse and Faithful One-shot Adaption of Generative Adversarial Networks

1 code implementation18 Jul 2022 Yabo Zhang, Mingshuai Yao, Yuxiang Wei, Zhilong Ji, Jinfeng Bai, WangMeng Zuo

In this paper, we present a novel one-shot generative domain adaption method, i. e., DiFa, for diverse generation and faithful adaptation.

Diversity Domain Adaptation

Learning Diverse Tone Styles for Image Retouching

1 code implementation12 Jul 2022 Haolin Wang, Jiawei Zhang, Ming Liu, Xiaohe Wu, WangMeng Zuo

In particular, the style encoder predicts the target style representation of an input image, which serves as the conditional information in the RetouchNet for retouching, while the TSFlow maps the style representation vector into a Gaussian distribution in the forward pass.

Image Retouching

An Improved Normed-Deformable Convolution for Crowd Counting

1 code implementation16 Jun 2022 Xin Zhong, Zhaoyi Yan, Jing Qin, WangMeng Zuo, Weigang Lu

However, the heads are not uniformly covered by the sampling points in the deformable convolution, resulting in loss of head information.

Crowd Counting

Robust Deep Ensemble Method for Real-world Image Denoising

1 code implementation8 Jun 2022 Pengju Liu, Hongzhi Zhang, Jinghui Wang, Yuzhi Wang, Dongwei Ren, WangMeng Zuo

In particular, we take well-trained CBDNet, NBNet, HINet, Uformer and GMSNet into denoiser pool, and a U-Net is adopted to predict pixel-wise weighting maps to fuse these denoisers.

Deblurring Image Deblurring +4

Image Super-resolution with An Enhanced Group Convolutional Neural Network

1 code implementation29 May 2022 Chunwei Tian, Yixuan Yuan, Shichao Zhang, Chia-Wen Lin, WangMeng Zuo, David Zhang

In this paper, we present an enhanced super-resolution group CNN (ESRGCNN) with a shallow architecture by fully fusing deep and wide channel features to extract more accurate low-frequency information in terms of correlations of different channels in single image super-resolution (SISR).

Image Super-Resolution

Squeeze Training for Adversarial Robustness

1 code implementation23 May 2022 Qizhang Li, Yiwen Guo, WangMeng Zuo, Hao Chen

The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted great attention in the machine learning community.

Adversarial Robustness

Learning Dual-Pixel Alignment for Defocus Deblurring

1 code implementation26 Apr 2022 Yu Li, Yaling Yi, Dongwei Ren, Qince Li, WangMeng Zuo

Generally, DPANet is an encoder-decoder with skip-connections, where two branches with shared parameters in the encoder are employed to extract and align deep features from left and right views, and one decoder is adopted to fuse aligned features for predicting the sharp image.

Deblurring Decoder

Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment

no code implementations CVPR 2022 Yue Cao, Zhaolin Wan, Dongwei Ren, Zifei Yan, WangMeng Zuo

Particularly, by treating all labeled data as positive samples, PU learning is leveraged to identify negative samples (i. e., outliers) from unlabeled data.

Full-Reference Image Quality Assessment

Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-ahead Forward Ones

1 code implementation12 Apr 2022 Junyi Li, Xiaohe Wu, Zhenxing Niu, WangMeng Zuo

However, BiRNN is intrinsically offline because it uses backward recurrent modules to propagate from the last to current frames, which causes high latency and large memory consumption.

Denoising Video Denoising +1

Localization Distillation for Object Detection

1 code implementation12 Apr 2022 Zhaohui Zheng, Rongguang Ye, Qibin Hou, Dongwei Ren, Ping Wang, WangMeng Zuo, Ming-Ming Cheng

Combining these two new components, for the first time, we show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.

Knowledge Distillation Object +2

Semantic-shape Adaptive Feature Modulation for Semantic Image Synthesis

1 code implementation CVPR 2022 Zhengyao Lv, Xiaoming Li, Zhenxing Niu, Bing Cao, WangMeng Zuo

Obviously, a fine-grained part-level semantic layout will benefit object details generation, and it can be roughly inferred from an object's shape.

Image Generation Object

An Intermediate-level Attack Framework on The Basis of Linear Regression

1 code implementation21 Mar 2022 Yiwen Guo, Qizhang Li, WangMeng Zuo, Hao Chen

This paper substantially extends our work published at ECCV, in which an intermediate-level attack was proposed to improve the transferability of some baseline adversarial examples.

regression

Self-Promoted Supervision for Few-Shot Transformer

1 code implementation14 Mar 2022 Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo

The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired.

Data Augmentation Few-Shot Learning +1

On Steering Multi-Annotations per Sample for Multi-Task Learning

no code implementations6 Mar 2022 Yuanze Li, Yiwen Guo, Qizhang Li, Hongzhi Zhang, WangMeng Zuo

Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored.

Instance Segmentation Multi-Task Learning +2

Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations

2 code implementations2 Mar 2022 Zhilu Zhang, Ruohao Wang, Hongzhi Zhang, Yunjin Chen, WangMeng Zuo

For this purpose, we take the telephoto image instead of an additional high-resolution image as the supervision information and select a center patch from it as the reference to super-resolve the corresponding short-focus image patch.

Reference-based Super-Resolution Self-Supervised Learning

NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

no code implementations10 Feb 2022 Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, WangMeng Zuo, Nan Duan

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged.

Image Inpainting

Invertible Network for Unpaired Low-light Image Enhancement

no code implementations24 Dec 2021 Jize Zhang, Haolin Wang, Xiaohe Wu, WangMeng Zuo

Existing unpaired low-light image enhancement approaches prefer to employ the two-way GAN framework, in which two CNN generators are deployed for enhancement and degradation separately.

Low-Light Image Enhancement

Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds

no code implementations29 Sep 2021 Fangcen Liu, Chenqiang Gao, Fang Chen, Deyu Meng, WangMeng Zuo, Xinbo Gao

We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision

1 code implementation ICCV 2021 Zhilu Zhang, Haolin Wang, Ming Liu, Ruohao Wang, Jiawei Zhang, WangMeng Zuo

To diminish the effect of color inconsistency in image alignment, we introduce to use a global color mapping (GCM) module to generate an initial sRGB image given the input raw image, which can keep the spatial location of the pixels unchanged, and the target sRGB image is utilized to guide GCM for converting the color towards it.

Optical Flow Estimation

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting

1 code implementation ICCV 2021 Binghui Chen, Zhaoyi Yan, Ke Li, Pengyu Li, Biao Wang, WangMeng Zuo, Lei Zhang

In crowd counting, due to the problem of laborious labelling, it is perceived intractability of collecting a new large-scale dataset which has plentiful images with large diversity in density, scene, etc.

Crowd Counting Diversity

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation

1 code implementation ICCV 2021 Yuxiang Wei, Yupeng Shi, Xiao Liu, Zhilong Ji, Yuan Gao, Zhongqin Wu, WangMeng Zuo

It simply encourages the variation of output caused by perturbations on different latent dimensions to be orthogonal, and the Jacobian with respect to the input is calculated to represent this variation.

Disentanglement Image Generation

Local Patch Network with Global Attention for Infrared Small Target Detection

1 code implementation13 Aug 2021 Fang Chen, Chenqiang Gao, Fangcen Liu, Yue Zhao, Yuxi Zhou, Deyu Meng, WangMeng Zuo

A local patch network (LPNet) with global attention is proposed in this paper to detect small targets by jointly considering the global and local properties of infrared small target images.

Deep Learning Semantic Segmentation

Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters

1 code implementation ICCV 2021 Bowen Dong, Zitong Huang, Yuelin Guo, Qilong Wang, Zhenxing Niu, WangMeng Zuo

In this paper, we defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset.

Object object-detection +3

Crowd Counting via Perspective-Guided Fractional-Dilation Convolution

1 code implementation8 Jul 2021 Zhaoyi Yan, Ruimao Zhang, Hongzhi Zhang, Qingfu Zhang, WangMeng Zuo

One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect.

Crowd Counting

Learning Scalable lY=-Constrained Near-Lossless Image Compression via Joint Lossy Image and Residual Compression

no code implementations CVPR 2021 Yuanchao Bai, Xianming Liu, WangMeng Zuo, YaoWei Wang, Xiangyang Ji

To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks.

Image Compression

VirFace: Enhancing Face Recognition via Unlabeled Shallow Data

no code implementations CVPR 2021 Wenyu Li, Tianchu Guo, Pengyu Li, Binghui Chen, Biao Wang, WangMeng Zuo, Lei Zhang

In this paper, we propose a novel face recognition method, named VirFace, to effectively apply the unlabeled shallow data for face recognition.

Face Recognition

Image Inpainting with Edge-guided Learnable Bidirectional Attention Maps

1 code implementation25 Apr 2021 Dongsheng Wang, Chaohao Xie, Shaohui Liu, Zhenxing Niu, WangMeng Zuo

In this paper, we present an edge-guided learnable bidirectional attention map (Edge-LBAM) for improving image inpainting of irregular holes with several distinct merits.

Image Inpainting valid