1 code implementation • 21 Jan 2025 • Shiyue Zhang, Zheng Chong, Xi Lu, Wenqing Zhang, Haoxiang Li, Xujie Zhang, Jiehui Huang, Xiao Dong, Xiaodan Liang
Building on the success of diffusion models, significant advancements have been made in multimodal image generation tasks.
no code implementations • 20 Jan 2025 • Zheng Chong, Wenqing Zhang, Shiyue Zhang, Jun Zheng, Xiao Dong, Haoxiang Li, Yiling Wu, Dongmei Jiang, Xiaodan Liang
Comprehensive experiments demonstrate that CatV2TON outperforms existing methods in both image and video try-on tasks, offering a versatile and reliable solution for realistic virtual try-ons across diverse scenarios.
no code implementations • 4 Jan 2025 • Chaoyi Tan, Wenqing Zhang, Zhen Qi, Kowei Shih, Xinshi Li, Ao Xiang
In the field of computer vision, multimodal image generation has become a research hotspot, especially the task of integrating text, image, and style.
no code implementations • 4 Dec 2024 • Wenying Sun, Zhen Xu, Wenqing Zhang, Kunyuan Ma, You Wu, Mengfang Sun
This paper aims to study the prediction of the bank stability index based on the Time Series Transformer model.
no code implementations • 18 Nov 2024 • Han Cao, Zhaoyang Zhang, Xiangtian Li, Chufan Wu, Hansong Zhang, Wenqing Zhang
In the context of knowledge-driven seq-to-seq generation tasks, such as document-based question answering and document summarization systems, two fundamental knowledge sources play crucial roles: the inherent knowledge embedded within model parameters and the external knowledge obtained through context.
no code implementations • 4 Nov 2024 • Zihao Zhao, Yijiang Li, Yuchen Yang, Wenqing Zhang, Nuno Vasconcelos, Yinzhi Cao
Machine unlearning--enabling a trained model to forget specific data--is crucial for addressing biased data and adhering to privacy regulations like the General Data Protection Regulation (GDPR)'s "right to be forgotten".
no code implementations • 13 Sep 2024 • Wenqing Zhang, Junming Huang, Ruotong Wang, Changsong Wei, Wenqian Huang, Yuxin Qiao
Long-short range time series forecasting is essential for predicting future trends and patterns over extended periods.
no code implementations • 5 Sep 2024 • Jingyu Zhang, Wenqing Zhang, Chaoyi Tan, Xiangtian Li, Qianyi Sun
It is very important to detect traffic signs efficiently and accurately in autonomous driving systems.
no code implementations • 23 Aug 2024 • Shuzhen Yang, Wenqing Zhang
In this study, we consider the asset pricing under model uncertainty with finite time and under a family of probability, and explore its relationship with risk neutral probability meastates structure.
1 code implementation • 21 Jul 2024 • Zheng Chong, Xiao Dong, Haoxiang Li, Shiyue Zhang, Wenqing Zhang, Xujie Zhang, Hanqing Zhao, Xiaodan Liang
Virtual try-on methods based on diffusion models achieve realistic try-on effects but often replicate the backbone network as a ReferenceNet or use additional image encoders to process condition inputs, leading to high training and inference costs.
no code implementations • 19 Mar 2024 • Shige Peng, Shuzhen Yang, Wenqing Zhang
The integration and innovation of finance and technology have gradually transformed the financial system into a complex one.
no code implementations • 22 Feb 2024 • Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi
Despite its simplicity, we show that IDA shows efficiency and fast convergence in resolving the social bias in TTI diffusion models.
no code implementations • 14 Sep 2023 • David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou
Dataset condensation aims to condense a large dataset with a lot of training samples into a small set.
no code implementations • ICCV 2023 • Xujie Zhang, BinBin Yang, Michael C. Kampffmeyer, Wenqing Zhang, Shiyue Zhang, Guansong Lu, Liang Lin, Hang Xu, Xiaodan Liang
Cross-modal garment synthesis and manipulation will significantly benefit the way fashion designers generate garments and modify their designs via flexible linguistic interfaces. Current approaches follow the general text-to-image paradigm and mine cross-modal relations via simple cross-attention modules, neglecting the structural correspondence between visual and textual representations in the fashion design domain.
no code implementations • 13 Aug 2023 • David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou
Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy.
no code implementations • 1 Aug 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi
To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes.
Ranked #3 on
3D Open-Vocabulary Instance Segmentation
on S3DIS
4 code implementations • CVPR 2024 • Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai
In this work, we extend this editing framework to diffusion models and propose a novel approach DragDiffusion.
no code implementations • 19 Jan 2023 • Shuzhen Yang, Wenqing Zhang
In this study, we develop an efficient iterative algorithm for the SVI model based on a fixed-point and least-square optimizer.
no code implementations • 13 Dec 2022 • Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou
While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos.
1 code implementation • CVPR 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.
Ranked #2 on
3D Open-Vocabulary Instance Segmentation
on S3DIS
3D Open-Vocabulary Instance Segmentation
Contrastive Learning
+4
no code implementations • 3 Nov 2022 • Xinyue Zhang, Genwang Wei, Ye Sheng, Jiong Yang, Caichao Ye, Wenqing Zhang
By investigating the combinations of polymer units with mobility performance, a scheme for designing polymer OSC materials by combining ML approaches and PUFp information is proposed to not only passively predict OSC mobility but also actively provide structural guidance for new high-mobility OSC material design.
1 code implementation • 14 Oct 2022 • Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi
Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.
2 code implementations • 1 Oct 2022 • Yujun Shi, Jian Liang, Wenqing Zhang, Vincent Y. F. Tan, Song Bai
To remedy this problem caused by the data heterogeneity, we propose {\sc FedDecorr}, a novel method that can effectively mitigate dimensional collapse in federated learning.
no code implementations • 1 Sep 2022 • Zhangzi Zhu, Chuhui Xue, Yu Hao, Wenqing Zhang, Song Bai
Our oCLIP-based model achieves 28. 59\% in h-mean which ranks 1st in end-to-end OOV word recognition track of OOV Challenge in ECCV2022 TiE Workshop.
no code implementations • 4 Aug 2022 • Zhangzi Zhu, Yu Hao, Wenqing Zhang, Chuhui Xue, Song Bai
This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition.
no code implementations • 6 Jul 2022 • Jiahui Zhang, Fangneng Zhan, Rongliang Wu, Yingchen Yu, Wenqing Zhang, Bai Song, Xiaoqin Zhang, Shijian Lu
With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose transformations between the pair of rendered and real images.
no code implementations • CVPR 2022 • Jingqun Tang, Wenqing Zhang, Hongye Liu, Mingkun Yang, Bo Jiang, Guanglong Hu, Xiang Bai
Different from previous approaches that learn robust deep representations of scene text in a holistic manner, our method performs scene text detection based on a few representative features, which avoids the disturbance by background and reduces the computational cost.
Ranked #24 on
Object Detection In Aerial Images
on DOTA
(using extra training data)
no code implementations • 8 Mar 2022 • Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations.
Optical Character Recognition
Optical Character Recognition (OCR)
+2
2 code implementations • 15 Dec 2021 • Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai
Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.
Ranked #2 on
Video Instance Segmentation
on HQ-YTVIS
1 code implementation • 18 Nov 2021 • Xiang Bai, Hanchen Wang, Liya Ma, Yongchao Xu, Jiefeng Gan, Ziwei Fan, Fan Yang, Ke Ma, Jiehua Yang, Song Bai, Chang Shu, Xinyu Zou, Renhao Huang, Changzheng Zhang, Xiaowu Liu, Dandan Tu, Chuou Xu, Wenqing Zhang, Xi Wang, Anguo Chen, Yu Zeng, Dehua Yang, Ming-Wei Wang, Nagaraj Holalkere, Neil J. Halin, Ihab R. Kamel, Jia Wu, Xuehua Peng, Xiang Wang, Jianbo Shao, Pattanasak Mongkolwat, Jianjun Zhang, Weiyang Liu, Michael Roberts, Zhongzhao Teng, Lucian Beer, Lorena Escudero Sanchez, Evis Sala, Daniel Rubin, Adrian Weller, Joan Lasenby, Chuangsheng Zheng, Jianming Wang, Zhen Li, Carola-Bibiane Schönlieb, Tian Xia
Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses.
no code implementations • 29 Sep 2021 • Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Song Bai, Changhu Wang
This paper presents Contextual Text Detection, a new setup that detects contextual text blocks for better understanding of texts in scenes.
no code implementations • 18 May 2021 • Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Changhu Wang, Song Bai
The first task focuses on image-to-character (I2C) mapping which detects a set of character candidates from images based on different alignments of visual features in an non-sequential way.
no code implementations • 9 Dec 2020 • Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai
It is a general labeling method for texts with various shapes and requires low labeling costs.
no code implementations • 22 Jul 2020 • Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai
In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.
no code implementations • 6 Oct 2019 • Shiyu Yi, Donglin Zhan, Wenqing Zhang, Denglin Jiang, Kang An, Hao Wang
Generative Adversarial Networks (GAN) training process, in most cases, apply Uniform or Gaussian sampling methods in the latent space, which probably spends most of the computation on examples that can be properly handled and easy to generate.