Search Results for author: Zhentao Tan

Found 26 papers, 14 papers with code

Origin Identification for Text-Guided Image-to-Image Diffusion Models

no code implementations4 Jan 2025 Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang

Subsequently, it is demonstrated that such a simple linear transformation can be generalized across different diffusion models.

Misinformation

SweetTokenizer: Semantic-Aware Spatial-Temporal Tokenizer for Compact Visual Discretization

no code implementations11 Dec 2024 Zhentao Tan, Ben Xue, Jian Jia, Junhao Wang, Wencai Ye, Shaoyun Shi, MingJie Sun, Wenjin Wu, Quan Chen, Peng Jiang

SweetTokenizer achieves comparable video reconstruction fidelity with only \textbf{25\%} of the tokens used in previous state-of-the-art video tokenizers, and boost video generation results by \textbf{32. 9\%} w. r. t gFVD.

Image Reconstruction Representation Learning +2

Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection

1 code implementation3 Oct 2024 Tianxiang Chen, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Jieping Ye, Nenghai Yu

We observe performance dips in question-answering benchmarks after the removal or expansion of the shallow layers, and the degradation shrinks as the layer gets deeper, indicating that the shallow layers hold the key to knowledge injection.

Math parameter-efficient fine-tuning +1

Image Copy Detection for Diffusion Models

no code implementations30 Sep 2024 Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang

Existing Image Copy Detection (ICD) models, though accurate in detecting hand-crafted replicas, overlook the challenge from diffusion models.

Copy Detection Marketing

Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization

no code implementations5 Aug 2024 Changtao Miao, Qi Chu, Tao Gong, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Man Luo, Honggang Hu, Nenghai Yu

The FUP integrates detection and localization tasks using a token learning strategy and multiple forgery-aware transformers, which facilitates the use of classification information to enhance localization capability.

Face Detection Mixture-of-Experts

Replication in Visual Diffusion Models: A Survey and Outlook

1 code implementation7 Jul 2024 Wenhao Wang, Yifan Sun, Zongxin Yang, Zhengdong Hu, Zhentao Tan, Yi Yang

In this survey, we provide the first comprehensive review of replication in visual diffusion models, marking a novel contribution to the field by systematically categorizing the existing studies into unveiling, understanding, and mitigating this phenomenon.

Benchmarking Survey

Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

1 code implementation14 Jun 2024 Zhentao Tan, Yadong Mu

Recently various optimization problems, such as Mixed Integer Linear Programming Problems (MILPs), have undergone comprehensive investigation, leveraging the capabilities of machine learning.

Combinatorial Optimization

AnyPattern: Towards In-context Image Copy Detection

2 code implementations21 Apr 2024 Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang

To accommodate the "seen $\rightarrow$ unseen" generalization scenario, we construct the first large-scale pattern dataset named AnyPattern, which has the largest number of tamper patterns ($90$ for training and $10$ for testing) among all the existing ones.

Copy Detection In-Context Learning

MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

1 code implementation4 Mar 2024 Tianxiang Chen, Zi Ye, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Nenghai Yu, Jieping Ye

By aggregating the visual word and visual sentence features, our MiM-ISTD can effectively explore both global and local information.

Mamba Sentence

Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

no code implementations4 Feb 2024 Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jieping Ye, Nenghai Yu

This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations.

Decoder Representation Learning

TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

no code implementations3 Feb 2024 Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu

We abstract this process as the directional movement of feature map pixels to target areas through convolution, pooling and interactions with surrounding pixels, which can be analogous to the movement of thermal particles constrained by surrounding variables and particles.

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

1 code implementation CVPR 2024 Feifei Wang, Zhentao Tan, Tianyi Wei, Yue Wu, Qidong Huang

Despite the success of diffusion-based customization methods on visual content creation, increasing concerns have been raised about such techniques from both privacy and political perspectives.

Denoising Image Generation

Towards More Unified In-context Visual Understanding

no code implementations CVPR 2024 Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline. Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines.

Decoder Image Captioning +2

Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal

no code implementations15 Jun 2023 Zhentao Tan, Yue Wu, Qiankun Liu, Qi Chu, Le Lu, Jieping Ye, Nenghai Yu

Inspired by the various successful applications of large-scale pre-trained models (e. g, CLIP), in this paper, we explore the potential benefits of them for this task through both spatial feature representation learning and semantic information embedding aspects: 1) for spatial feature representation learning, we design a Spatially-Adaptive Residual (\textbf{SAR}) Encoder to extract degraded areas adaptively.

Image Restoration Representation Learning

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

1 code implementation8 Jun 2023 Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.

Denoising Diversity +3

Multi-spectral Class Center Network for Face Manipulation Detection and Localization

1 code implementation18 May 2023 Changtao Miao, Qi Chu, Zhentao Tan, Zhenchao Jin, Tao Gong, Wanyi Zhuang, Yue Wu, Bin Liu, Honggang Hu, Nenghai Yu

To this end, a novel Multi-Spectral Class Center Network (MSCCNet) is proposed for face manipulation detection and localization.

Face Swapping

Video Action Segmentation via Contextually Refined Temporal Keypoints

no code implementations ICCV 2023 Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu

Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.

Action Segmentation Graph Matching +1

Diverse Semantic Image Synthesis via Probability Distribution Modeling

1 code implementation CVPR 2021 Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu

In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level.

Diversity Image-to-Image Translation

Efficient Semantic Image Synthesis via Class-Adaptive Normalization

1 code implementation8 Dec 2020 Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu

Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away.

Image Generation

MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing

1 code implementation30 Oct 2020 Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu

In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.

Conditional Image Generation

Rethinking Spatially-Adaptive Normalization

no code implementations6 Apr 2020 Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Nenghai Yu

Despite its impressive performance, a more thorough understanding of the true advantages inside the box is still highly demanded, to help reduce the significant computation and parameter overheads introduced by these new structures.

Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.