1 code implementation • 21 Apr 2024 • Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang
This paper explores in-context learning for image copy detection (ICD), i. e., prompting an ICD model to identify replicated images with new tampering patterns without the need for additional training.
1 code implementation • 31 Mar 2024 • Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu
The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer.
1 code implementation • 4 Mar 2024 • Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Jieping Ye, Nenghai Yu
Inspired by the recent basic model with linear complexity for long-distance modeling, called Mamba, we explore the potential of this state space model for ISTD task in terms of effectiveness and efficiency in the paper.
no code implementations • 4 Feb 2024 • Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jieping Ye, Nenghai Yu
This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations.
no code implementations • 3 Feb 2024 • Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu
We abstract this process as the directional movement of feature map pixels to target areas through convolution, pooling and interactions with surrounding pixels, which can be analogous to the movement of thermal particles constrained by surrounding variables and particles.
no code implementations • 13 Dec 2023 • Feifei Wang, Zhentao Tan, Tianyi Wei, Yue Wu, Qidong Huang
Despite the success of diffusion-based customization methods on visual content creation, increasing concerns have been raised about such techniques from both privacy and political perspectives.
no code implementations • 5 Dec 2023 • Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu
Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline. Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines.
no code implementations • 15 Jun 2023 • Zhentao Tan, Yue Wu, Qiankun Liu, Qi Chu, Le Lu, Jieping Ye, Nenghai Yu
Inspired by the various successful applications of large-scale pre-trained models (e. g, CLIP), in this paper, we explore the potential benefits of them for this task through both spatial feature representation learning and semantic information embedding aspects: 1) for spatial feature representation learning, we design a Spatially-Adaptive Residual (\textbf{SAR}) Encoder to extract degraded areas adaptively.
1 code implementation • 8 Jun 2023 • Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu
This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.
1 code implementation • 18 May 2023 • Changtao Miao, Qi Chu, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Yue Wu, Bin Liu, Honggang Hu, Nenghai Yu
Next, a novel Multi-Spectral Class Center Network (MSCCNet) is proposed for face manipulation detection and localization.
no code implementations • ICCV 2023 • Borui Jiang, Yang Jin, Zhentao Tan, Yadong Mu
Video action segmentation refers to the task of densely casting each video frame or short segment in an untrimmed video into some pre-specified action categories.
no code implementations • 23 Oct 2022 • Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao Miao, Zixiang Luo, Nenghai Yu
UPCL is designed for learning the consistency-related representation with progressive optimized pseudo annotations.
1 code implementation • CVPR 2022 • Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu
The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer.
Ranked #6 on Seeing Beyond the Visible on KITTI360-EX
1 code implementation • CVPR 2022 • Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu
Hair editing is an interesting and challenging problem in computer vision and graphics.
1 code implementation • CVPR 2021 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu
In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level.
Ranked #1 on Image-to-Image Translation on ADE20K Labels-to-Photos (LPIPS metric)
1 code implementation • 8 Dec 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu
Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away.
1 code implementation • 30 Oct 2020 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu
In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.
no code implementations • 6 Apr 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Nenghai Yu
Despite its impressive performance, a more thorough understanding of the true advantages inside the box is still highly demanded, to help reduce the significant computation and parameter overheads introduced by these new structures.