no code implementations • 13 Dec 2023 • Raghav Goyal, Wan-Cyuan Fan, Mennatullah Siam, Leonid Sigal
In this work we propose a novel, clip-based DETR-style encoder-decoder architecture, which focuses on systematically analyzing and addressing aforementioned challenges.
no code implementations • 26 Nov 2022 • Wan-Cyuan Fan, Cheng-Fu Yang, Chiao-An Yang, Yu-Chiang Frank Wang
We tackle the problem of target-free text-guided image manipulation, which requires one to modify the input reference image based on the given text instruction, while no ground truth target image is observed during training.
no code implementations • 25 Sep 2022 • Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang
Since no ground truth captions are available for novel object images during training, our P2C leverages cross-modality (image-text) association modules to ensure the above caption characteristics can be properly preserved.
1 code implementation • 29 Aug 2022 • Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang
Diffusion models (DMs) have shown great potential for high-quality image synthesis.
no code implementations • CVPR 2022 • Chiao-An Yang, Cheng-Yo Tan, Wan-Cyuan Fan, Cheng-Fu Yang, Meng-Lin Wu, Yu-Chiang Frank Wang
In particular, we propose a novel network of Scene Graph Transformer (SGT), which is designed to take node and edge features as inputs for modeling the associated structural information.
no code implementations • 29 Sep 2021 • Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Yu-Chiang Frank Wang, Louis-Philippe Morency, Ruslan Salakhutdinov
Novel object captioning (NOC) learns image captioning models for describing objects or visual concepts which are unseen (i. e., novel) in the training captions.
1 code implementation • CVPR 2021 • Cheng-Fu Yang, Wan-Cyuan Fan, Fu-En Yang, Yu-Chiang Frank Wang
To better exploit the text input, so that implicit objects or relationships can be properly inferred during layout generation, we propose a LayoutTransformer Network (LT-Net) in this paper.
no code implementations • 1 Jan 2021 • Cheng-Fu Yang, Wan-Cyuan Fan, Fu-En Yang, Yu-Chiang Frank Wang
In the areas of machine learning and computer vision, text-to-image synthesis aims at producing image outputs given the input text.
no code implementations • 1 Jan 2021 • Yen-Chi Hsu, Cheng-Yao Hong, Wan-Cyuan Fan, Ding-Jie Chen, Ming-Sui Lee, Davi Geiger, Tyng-Luh Liu
The Fine-Grained Visual Classification (FGVC) problem is notably characterized by two intriguing properties, significant inter-class similarity and intra-class variations, which cause learning an effective FGVC classifier a challenging task.
no code implementations • 28 Oct 2019 • Yen-Chi Hsu, Cheng-Yao Hong, Wan-Cyuan Fan, Ming-Sui Lee, Davi Geiger, Tyng-Luh Liu
With the development of deep learning, standard classification problems have achieved good results.
Fine-Grained Image Classification Fine-Grained Visual Recognition