1 code implementation • 12 Sep 2024 • Yifu Chen, Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Zhineng Chen, Tao Mei
In this paper, we propose to decompose the typical single-stage object inpainting into two cascaded processes: 1) semantic pre-inpainting that infers the semantic features of desired objects in a multi-modal feature space; 2) high-fieldity object generation in diffusion latent space that pivots on such inpainted semantic features.
no code implementations • 11 Sep 2024 • Yang Luo, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Zhineng Chen, Yu-Gang Jiang, Tao Mei
Technically, FreeEnhance is a two-stage process that firstly adds random noise to the input image and then capitalizes on a pre-trained image diffusion model (i. e., Latent Diffusion Models) to denoise and enhance the image details.
1 code implementation • 11 Sep 2024 • Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Chong-Wah Ngo, Tao Mei
Despite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D awareness.
no code implementations • 11 Sep 2024 • Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei
In the fine stage, DreamMesh jointly manipulates the mesh and refines the texture map, leading to high-quality triangle meshes with high-fidelity textured materials.
no code implementations • 11 Sep 2024 • Feiyang Jia, Zhineng Chen, Ziying Song, Lin Liu, Caiyan Jia
Super-resolution (SR) aims to enhance the quality of low-resolution images and has been widely applied in medical imaging.
1 code implementation • 11 Aug 2024 • Shuai Zhao, Yongkun Du, Zhineng Chen, Yu-Gang Jiang
Extensive experiments across various STR decoders and language recognition tasks underscore the broad applicability and remarkable performance of DPTR, providing a novel insight for STR pre-training.
1 code implementation • 4 Aug 2024 • Xin Wang, Kai Chen, Xingjun Ma, Zhineng Chen, Jingjing Chen, Yu-Gang Jiang
During this process, the queries made to the target model are intermediate adversarial examples crafted at the previous attack step, which share high similarities in the pixel space.
1 code implementation • 17 Jul 2024 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xieping Gao, Yu-Gang Jiang
In this paper, we term this task Out of Length (OOL) text recognition.
no code implementations • 17 Jun 2024 • Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou, Boning Wang, Jiaqi Huang, Zunnan Xu, Xiu Li, Kehong Yuan, Yanyan Zu, Jiayao Ha, Qiong Gao, Licheng Jiao
2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects.
1 code implementation • CVPR 2024 • Yang Luo, Zhineng Chen, Peng Zhou, Zuxuan Wu, Xieping Gao, Yu-Gang Jiang
The results demonstrate that LTRP outperforms both supervised and other self-supervised methods due to the fair assessment of image content.
1 code implementation • 31 Jan 2024 • Yongkun Du, Zhineng Chen, Yuchen Su, Caiyan Jia, Yu-Gang Jiang
We propose a novel instruction-guided scene text recognition (IGTR) paradigm that formulates STR as an instruction learning problem and understands text images by predicting character attributes, e. g., character frequency, position, etc.
1 code implementation • 9 Nov 2023 • Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Tao Mei
In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
1 code implementation • 23 Jul 2023 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, Yu-Gang Jiang
We first present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception.
Ranked #1 on Scene Text Recognition on CUTE80 (using extra training data)
1 code implementation • 27 Jun 2023 • Yuchen Su, Zhineng Chen, Zhiwen Shao, Yuning Du, Zhilong Ji, Jinfeng Bai, Yong Zhou, Yu-Gang Jiang
Next, we propose a dual assignment scheme for speed acceleration.
1 code implementation • ICCV 2023 • Tianlun Zheng, Zhineng Chen, Bingchen Huang, Wei zhang, Yu-Gang Jiang
In this paper, we propose the Incremental MLTR (IMLTR) task in the context of incremental learning (IL), where different languages are introduced in batches.
Ranked #1 on Incremental Learning on MLT17
1 code implementation • 9 May 2023 • Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang
In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time.
Ranked #1 on Scene Text Recognition on SVT-P
no code implementations • CVPR 2023 • Kexin Sun, Zhineng Chen, Gongwei Wang, Jun Liu, Xiongjun Ye, Yu-Gang Jiang
In order to eliminate the square effect, we design a bi-directional feature fusion generative adversarial network (BFF-GAN) with a global branch and a local branch.
1 code implementation • 29 Dec 2022 • Bingchen Huang, Zhineng Chen, Peng Zhou, Jiayin Chen, Zuxuan Wu
The dynamic expansion architecture is becoming popular in class incremental learning, mainly due to its advantages in alleviating catastrophic forgetting.
no code implementations • CVPR 2023 • HUI ZHANG, Zuxuan Wu, Zheng Wang, Zhineng Chen, Yu-Gang Jiang
Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness.
Ranked #4 on Supervised Anomaly Detection on MVTec AD (using extra training data)
3 code implementations • 30 Apr 2022 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang
Dominant scene text recognition models commonly contain two building blocks, a visual model for feature extraction and a sequence model for text transcription.
Ranked #16 on Scene Text Recognition on ICDAR2013
no code implementations • 31 Mar 2022 • Yang Luo, Zhineng Chen, Shengtian Zhou, Xieping Gao
In this paper, we introduce MAE and verify the effect of visible patches for histopathological image understanding.
2 code implementations • 22 Nov 2021 • Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang
In this paper, we propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.
Ranked #12 on Scene Text Recognition on ICDAR2015
no code implementations • 23 Aug 2019 • Yanhao Zhu, Zhineng Chen, Shuai Zhao, Hongtao Xie, Wenming Guo, Yongdong Zhang
Nowadays U-net-like FCNs predominate various biomedical image segmentation applications and attain promising performance, largely due to their elegant architectures, e. g., symmetric contracting and expansive paths as well as lateral skip-connections.
3 code implementations • 4 Jun 2018 • Fenfen Sheng, Zhineng Chen, Bo Xu
Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features.
no code implementations • CVPR 2017 • Wei Zhang, Xiaochun Cao, Rui Wang, Yuanfang Guo, Zhineng Chen
Second, we further extend bMS to a more general form, namely contrastive binary mean shift (cbMS), which maximizes the contrastive density in binary space, for finding informative patterns that are both frequent and discriminative for the dataset.