no code implementations • 19 Aug 2024 • Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma
Large-scale vision-language models, such as CLIP, are known to contain harmful societal bias regarding protected attributes (e. g., gender and age).
no code implementations • 28 Jun 2024 • Ding-Jiun Huang, Zi-Ting Chou, Yu-Chiang Frank Wang, Cheng Sun
While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing.
1 code implementation • 27 Jun 2024 • Jr-Jen Chen, Yu-Chien Liao, Hsi-Che Lin, Yu-Chu Yu, Yen-Chun Chen, Yu-Chiang Frank Wang
This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models.
no code implementations • 27 Jun 2024 • Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-Yi Lee
Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs).
no code implementations • 18 Jun 2024 • Ci-Siang Lin, I-Jieh Liu, Min-Hung Chen, Chien-Yi Wang, Sifei Liu, Yu-Chiang Frank Wang
With the proposed TAP-CL, our GroPrompt framework can generate temporal-consistent yet text-aware position prompts describing locations and movements for the referred object from the video.
no code implementations • 25 May 2024 • Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun
Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning.
no code implementations • 25 Mar 2024 • Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang
Vigor leverages LLM to produce a desirable referential order from the input description for 3D visual grounding.
no code implementations • 14 Mar 2024 • Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data.
no code implementations • CVPR 2024 • Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu, Yu-Chiang Frank Wang
Utilizing multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a popular research topic in 3D vision.
1 code implementation • 26 Feb 2024 • Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang
To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced.
4 code implementations • 14 Feb 2024 • Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen
By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.
Ranked #2 on parameter-efficient fine-tuning on WinoGrande (using extra training data)
no code implementations • 22 Jan 2024 • Ci-Siang Lin, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen
In this way, SemPLeS can perform better semantic alignment between object regions and the associated class labels, resulting in desired pseudo masks for training the segmentation model.
1 code implementation • 12 Dec 2023 • I-Jieh Liu, Ci-Siang Lin, Fu-En Yang, Yu-Chiang Frank Wang
Nevertheless, it is still challenging for FL to deal with user heterogeneity in their local data distribution in the real-world FL scenario, and this issue becomes even more severe in multi-label image classification.
no code implementations • 5 Dec 2023 • Bin-Shih Wu, Hong-En Chen, Sheng-Yu Huang, Yu-Chiang Frank Wang
With only 3D shape data and their rendered 2D images observed during training, our TPA3D is designed to retrieve detailed visual descriptions for synthesizing the corresponding 3D mesh data.
no code implementations • CVPR 2024 • Cheng Sun, Wei-En Tai, Yu-Lin Shih, Kuan-Wei Chen, Yong-Jing Syu, Kent Selwyn The, Yu-Chiang Frank Wang, Hwann-Tzong Chen
State-of-the-art single-view 360-degree room layout reconstruction methods formulate the problem as a high-level 1D (per-column) regression task.
no code implementations • 29 Nov 2023 • Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept.
1 code implementation • 18 Oct 2023 • Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan, Yu-Chiang Frank Wang, Kai-Wei Chang
Additional analysis shows that the contrastive objective and meta-actions are complementary in achieving the best results, and the resulting agent better aligns its states with corresponding instructions, making it more suitable for real-world embodied agents.
no code implementations • 9 Sep 2023 • Ci-Siang Lin, Min-Hung Chen, Yu-Chiang Frank Wang
Data collected from the real world typically exhibit long-tailed distributions, where frequent classes contain abundant data while rare ones have only a limited number of samples.
no code implementations • ICCV 2023 • Fu-En Yang, Chien-Yi Wang, Yu-Chiang Frank Wang
To leverage robust representations from large-scale models while enabling efficient model personalization for heterogeneous clients, we propose a novel personalized FL framework of client-specific Prompt Generation (pFedPG), which learns to deploy a personalized prompt generator at the server for producing client-specific visual prompts that efficiently adapts frozen backbones to local data distributions.
1 code implementation • 19 Jul 2023 • Chia-Hsiang Kao, Yu-Chiang Frank Wang
In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Unfreezing), a novel FL framework designed to effectively mitigate client drift.
1 code implementation • 30 Jun 2023 • Hsi-Che Lin, Chien-Yi Wang, Min-Hung Chen, Szu-Wei Fu, Yu-Chiang Frank Wang
This technical report describes our QuAVF@NTU-NVIDIA submission to the Ego4D Talking to Me (TTM) Challenge 2023.
no code implementations • 19 Feb 2023 • Yuan-Chia Cheng, Zu-Yun Shiau, Fu-En Yang, Yu-Chiang Frank Wang
In this paper, we present a learning framework of Tendency-and-Assignment Explainer (TAX), designed to offer interpretability at the annotator and assignment levels.
no code implementations • CVPR 2023 • Yuan-Yi Xu, Ci-Siang Lin, Yu-Chiang Frank Wang
Learning models trained on biased datasets tend to observe correlations between categorical and undesirable features, which result in degraded performances.
no code implementations • 26 Nov 2022 • Wan-Cyuan Fan, Cheng-Fu Yang, Chiao-An Yang, Yu-Chiang Frank Wang
We tackle the problem of target-free text-guided image manipulation, which requires one to modify the input reference image based on the given text instruction, while no ground truth target image is observed during training.
no code implementations • 25 Sep 2022 • Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang
Since no ground truth captions are available for novel object images during training, our P2C leverages cross-modality (image-text) association modules to ensure the above caption characteristics can be properly preserved.
1 code implementation • 30 Aug 2022 • Cheng-Yen Hsieh, Chih-Jung Chang, Fu-En Yang, Yu-Chiang Frank Wang
In particular, we present a cross-scale patch-level correlation learning in SS-PRL, which allows the model to aggregate and associate information learned across patch scales.
1 code implementation • 29 Aug 2022 • Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang
Diffusion models (DMs) have shown great potential for high-quality image synthesis.
no code implementations • 16 Aug 2022 • Zih-Ching Chen, Lin-Hsi Tsao, Chin-Lun Fu, Shang-Fu Chen, Yu-Chiang Frank Wang
Face anti-spoofing (FAS) aims at distinguishing face spoof attacks from the authentic ones, which is typically approached by learning proper models for performing the associated classification task.
no code implementations • CVPR 2022 • Chiao-An Yang, Cheng-Yo Tan, Wan-Cyuan Fan, Cheng-Fu Yang, Meng-Lin Wu, Yu-Chiang Frank Wang
In particular, we propose a novel network of Scene Graph Transformer (SGT), which is designed to take node and edge features as inputs for modeling the associated structural information.
1 code implementation • CVPR 2022 • Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang
We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance.
no code implementations • 23 Mar 2022 • Shang-Fu Chen, Yu-Min Liu, Chia-Ching Lin, Trista Pei-Chun Chen, Yu-Chiang Frank Wang
By observing normal and abnormal surface data across multiple source domains, our model is expected to be generalized to an unseen textured surface of interest, in which only a small number of normal data can be observed during testing.
no code implementations • 27 Dec 2021 • Yuan-Chia Cheng, Ci-Siang Lin, Fu-En Yang, Yu-Chiang Frank Wang
Few-shot classification aims to carry out classification given only few labeled examples for the categories of interest.
no code implementations • 27 Dec 2021 • Zu-Yun Shiau, Wei-Wei Lin, Ci-Siang Lin, Yu-Chiang Frank Wang
How to handle domain shifts when recognizing or segmenting visual data across domains has been studied by learning and vision communities.
1 code implementation • NeurIPS 2021 • Fu-En Yang, Yuan-Chia Cheng, Zu-Yun Shiau, Yu-Chiang Frank Wang
Domain generalization (DG) aims to transfer the learning task from a single or multiple source domains to unseen target domains.
no code implementations • 2 Nov 2021 • Yuan-Hao Lee, Fu-En Yang, Yu-Chiang Frank Wang
Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.
no code implementations • 29 Sep 2021 • Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Yu-Chiang Frank Wang, Louis-Philippe Morency, Ruslan Salakhutdinov
Novel object captioning (NOC) learns image captioning models for describing objects or visual concepts which are unseen (i. e., novel) in the training captions.
1 code implementation • CVPR 2021 • Cheng-Fu Yang, Wan-Cyuan Fan, Fu-En Yang, Yu-Chiang Frank Wang
To better exploit the text input, so that implicit objects or relationships can be properly inferred during layout generation, we propose a LayoutTransformer Network (LT-Net) in this paper.
no code implementations • 3 May 2021 • Yan-Bo Lin, Yu-Chiang Frank Wang
Human perceives rich auditory experience with distinct sound heard by ears.
no code implementations • 26 Feb 2021 • Fu-En Yang, Jing-Cheng Chang, Yuan-Hao Lee, Yu-Chiang Frank Wang
Generating videos with content and motion variations is a challenging task in computer vision.
no code implementations • 1 Jan 2021 • Cheng-Fu Yang, Wan-Cyuan Fan, Fu-En Yang, Yu-Chiang Frank Wang
In the areas of machine learning and computer vision, text-to-image synthesis aims at producing image outputs given the input text.
no code implementations • 2 Nov 2020 • Shang-Fu Chen, Jia-Wei Yan, Ya-Fan Su, Yu-Chiang Frank Wang
Representation disentanglement aims at learning interpretable features, so that the output can be recovered or manipulated accordingly.
no code implementations • 21 Oct 2020 • Jia-Wei Yan, Ci-Siang Lin, Fu-En Yang, Yu-Jhe Li, Yu-Chiang Frank Wang
Learning interpretable and interpolatable latent representations has been an emerging research direction, allowing researchers to understand and utilize the derived latent space for further applications such as visual synthesis or recognition.
no code implementations • 19 Oct 2020 • Ci-Siang Lin, Yuan-Chia Cheng, Yu-Chiang Frank Wang
That is, while a number of labeled source-domain datasets are available, we do not have access to any target-domain training data.
Domain Generalization Generalizable Person Re-identification +1
no code implementations • 2 Oct 2020 • Chih-Ting Liu, Yu-Jhe Li, Shao-Yi Chien, Yu-Chiang Frank Wang
As a result, our approach is able to augment the labeled training data in the semi-supervised setting.
no code implementations • ECCV 2020 • Yun-Chun Chen, Chao-Te Chou, Yu-Chiang Frank Wang
To address semi-supervised learning from both labeled and unlabeled data, we present a novel meta-learning scheme.
no code implementations • 17 Jul 2020 • Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang
Wavelet transform and the inverse wavelet transform are substituted for down-sampling and up-sampling so feature maps from the wavelet transform and convolutions contain different frequencies and scales.
no code implementations • 2 Jun 2020 • Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang
Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over a lengthy video input.
no code implementations • 19 Feb 2020 • Yu-Jhe Li, Yun-Chun Chen, Yen-Yu Lin, Yu-Chiang Frank Wang
Person re-identification (re-ID) aims at matching images of the same person across camera views.
no code implementations • ICCV 2019 • Yu-Jhe Li, Ci-Siang Lin, Yan-Bo Lin, Yu-Chiang Frank Wang
Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras.
Ranked #17 on Unsupervised Domain Adaptation on Market to Duke
no code implementations • ICCV 2019 • Yu-Jhe Li, Yun-Chun Chen, Yen-Yu Lin, Xiaofei Du, Yu-Chiang Frank Wang
Person re-identification (re-ID) aims at matching images of the same identity across camera views.
1 code implementation • 5 Aug 2019 • Chih-Ting Liu, Chih-Wei Wu, Yu-Chiang Frank Wang, Shao-Yi Chien
Video-based person re-identification (Re-ID) aims at matching video sequences of pedestrians across non-overlapping cameras.
Ranked #11 on Person Re-Identification on MARS
no code implementations • 25 Jul 2019 • Yun-Chun Chen, Yu-Jhe Li, Xiaofei Du, Yu-Chiang Frank Wang
Moreover, the extension of our model for semi-supervised re-ID further confirms the scalability of our proposed method for real-world scenarios and applications.
13 code implementations • ICLR 2019 • Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang
Few-shot classification aims to learn a classifier to recognize unseen classes during training with limited labeled examples.
2 code implementations • 20 Feb 2019 • Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang
Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).
no code implementations • 29 Nov 2018 • Yi-Lun Liao, Yao-Cheng Yang, Yu-Chiang Frank Wang
Aiming at inferring 3D shapes from 2D images, 3D shape reconstruction has drawn huge attention from researchers in computer vision and deep learning communities.
3D Reconstruction 3D Shape Reconstruction From A Single 2D Image +1
1 code implementation • NeurIPS 2018 • Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang
We present a novel and unified deep learning framework which is capable of learning domain-invariant representation from data across multiple domains.
no code implementations • ECCV 2018 • Hsuan-I Ho, Wei-Chen Chiu, Yu-Chiang Frank Wang
Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage.
no code implementations • ECCV 2018 • Hong-Min Chu, Chih-Kuan Yeh, Yu-Chiang Frank Wang
In order to train learning models for multi-label classification (MLC), it is typically desirable to have a large amount of fully annotated multi-label data.
4 code implementations • 5 May 2018 • Yu-Jhe Li, Hsin-Yu Chang, Yu-Jing Lin, Po-Wei Wu, Yu-Chiang Frank Wang
Deep reinforcement learning has shown its success in game playing.
no code implementations • 25 Apr 2018 • Yu-Jhe Li, Fu-En Yang, Yen-Cheng Liu, Yu-Ying Yeh, Xiaofei Du, Yu-Chiang Frank Wang
Person re-identification (Re-ID) aims at recognizing the same person from images taken across different cameras.
Ranked #20 on Unsupervised Domain Adaptation on Duke to Market
no code implementations • ECCV 2018 • Hsuan-I Ho, Wei-Chen Chiu, Yu-Chiang Frank Wang
Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage.
1 code implementation • CVPR 2018 • Chung-Wei Lee, Wei Fang, Chih-Kuan Yeh, Yu-Chiang Frank Wang
In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance.
1 code implementation • 18 Jul 2017 • Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, Yu-Chiang Frank Wang
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification.
1 code implementation • 3 Jul 2017 • Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, Yu-Chiang Frank Wang
Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance.
no code implementations • 7 Jun 2017 • Chih-Kuan Yeh, Yao-Hung Hubert Tsai, Yu-Chiang Frank Wang
In other words, our GDVM casts the supervised learning task as a generative learning process, with data discrimination to be jointly exploited for improved classification.
no code implementations • CVPR 2018 • Yen-Cheng Liu, Yu-Ying Yeh, Tzu-Chien Fu, Sheng-De Wang, Wei-Chen Chiu, Yu-Chiang Frank Wang
While representation learning aims to derive interpretable features for describing visual data, representation disentanglement further results in such features so that particular image attributes can be identified and manipulated.
9 code implementations • ICCV 2017 • Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun
Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases.
no code implementations • CVPR 2016 • Yao-Hung Hubert Tsai, Yi-Ren Yeh, Yu-Chiang Frank Wang
With the goal of deriving a domain-invariant feature subspace for HDA, our CDLS is able to identify representative cross-domain data, including the unlabeled ones in the target domain, for performing adaptation.
no code implementations • ICCV 2015 • Tzu Ming Harry Hsu, Wei Yu Chen, Cheng-An Hou, Yao-Hung Hubert Tsai, Yi-Ren Yeh, Yu-Chiang Frank Wang
For standard unsupervised domain adaptation, one typically obtains labeled data in the source domain and only observes unlabeled data in the target domain.
no code implementations • CVPR 2015 • Jen-Hao Rick Chang, Yu-Chiang Frank Wang
In this paper, we propose the propagation filter as a novel image filtering operator, with the goal of smoothing over neighboring image pixels while preserving image context like edges or textural regions.