Search Results for author: Yu-Chiang Frank Wang

Found 70 papers, 21 papers with code

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

no code implementations19 Aug 2024 Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

Large-scale vision-language models, such as CLIP, are known to contain harmful societal bias regarding protected attributes (e. g., gender and age).

Attribute

ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction

no code implementations28 Jun 2024 Ding-Jiun Huang, Zi-Ting Chou, Yu-Chiang Frank Wang, Cheng Sun

While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing.

Image Super-Resolution Novel View Synthesis

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

1 code implementation27 Jun 2024 Jr-Jen Chen, Yu-Chien Liao, Hsi-Che Lin, Yu-Chu Yu, Yen-Chun Chen, Yu-Chiang Frank Wang

This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models.

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

no code implementations27 Jun 2024 Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-Yi Lee

Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs).

Descriptive Instruction Following

GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

no code implementations18 Jun 2024 Ci-Siang Lin, I-Jieh Liu, Min-Hung Chen, Chien-Yi Wang, Sifei Liu, Yu-Chiang Frank Wang

With the proposed TAP-CL, our GroPrompt framework can generate temporal-consistent yet text-aware position prompts describing locations and movements for the referred object from the video.

Contrastive Learning Object +6

Diffusion-Reward Adversarial Imitation Learning

no code implementations25 May 2024 Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun

Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning.

Imitation Learning

Data-Efficient 3D Visual Grounding via Order-Aware Referring

no code implementations25 Mar 2024 Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang

Vigor leverages LLM to produce a desirable referential order from the input description for 3D visual grounding.

3D visual grounding Object

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

1 code implementation26 Feb 2024 Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced.

Quantization Speech Enhancement

DoRA: Weight-Decomposed Low-Rank Adaptation

4 code implementations14 Feb 2024 Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen

By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.

Ranked #2 on parameter-efficient fine-tuning on WinoGrande (using extra training data)

parameter-efficient fine-tuning

SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

no code implementations22 Jan 2024 Ci-Siang Lin, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen

In this way, SemPLeS can perform better semantic alignment between object regions and the associated class labels, resulting in desired pseudo masks for training the segmentation model.

Object Segmentation +2

Language-Guided Transformer for Federated Multi-Label Classification

1 code implementation12 Dec 2023 I-Jieh Liu, Ci-Siang Lin, Fu-En Yang, Yu-Chiang Frank Wang

Nevertheless, it is still challenging for FL to deal with user heterogeneity in their local data distribution in the real-world FL scenario, and this issue becomes even more severe in multi-label image classification.

Classification Federated Learning +3

TPA3D: Triplane Attention for Fast Text-to-3D Generation

no code implementations5 Dec 2023 Bin-Shih Wu, Hong-En Chen, Sheng-Yu Huang, Yu-Chiang Frank Wang

With only 3D shape data and their rendered 2D images observed during training, our TPA3D is designed to retrieve detailed visual descriptions for synthesizing the corresponding 3D mesh data.

3D Generation Sentence +1

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

no code implementations29 Nov 2023 Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang

Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept.

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

1 code implementation18 Oct 2023 Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan, Yu-Chiang Frank Wang, Kai-Wei Chang

Additional analysis shows that the contrastive objective and meta-actions are complementary in achieving the best results, and the resulting agent better aligns its states with corresponding instructions, making it more suitable for real-world embodied agents.

Contrastive Learning Instruction Following

Frequency-Aware Self-Supervised Long-Tailed Learning

no code implementations9 Sep 2023 Ci-Siang Lin, Min-Hung Chen, Yu-Chiang Frank Wang

Data collected from the real world typically exhibit long-tailed distributions, where frequent classes contain abundant data while rare ones have only a limited number of samples.

Self-Supervised Learning

Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation

no code implementations ICCV 2023 Fu-En Yang, Chien-Yi Wang, Yu-Chiang Frank Wang

To leverage robust representations from large-scale models while enabling efficient model personalization for heterogeneous clients, we propose a novel personalized FL framework of client-specific Prompt Generation (pFedPG), which learns to deploy a personalized prompt generator at the server for producing client-specific visual prompts that efficiently adapts frozen backbones to local data distributions.

Federated Learning

FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning

1 code implementation19 Jul 2023 Chia-Hsiang Kao, Yu-Chiang Frank Wang

In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Unfreezing), a novel FL framework designed to effectively mitigate client drift.

Federated Learning

QuAVF: Quality-aware Audio-Visual Fusion for Ego4D Talking to Me Challenge

1 code implementation30 Jun 2023 Hsi-Che Lin, Chien-Yi Wang, Min-Hung Chen, Szu-Wei Fu, Yu-Chiang Frank Wang

This technical report describes our QuAVF@NTU-NVIDIA submission to the Ego4D Talking to Me (TTM) Challenge 2023.

TAX: Tendency-and-Assignment Explainer for Semantic Segmentation with Multi-Annotators

no code implementations19 Feb 2023 Yuan-Chia Cheng, Zu-Yun Shiau, Fu-En Yang, Yu-Chiang Frank Wang

In this paper, we present a learning framework of Tendency-and-Assignment Explainer (TAX), designed to offer interpretability at the annotator and assignment levels.

Segmentation Semantic Segmentation

Bias-Eliminating Augmentation Learning for Debiased Federated Learning

no code implementations CVPR 2023 Yuan-Yi Xu, Ci-Siang Lin, Yu-Chiang Frank Wang

Learning models trained on biased datasets tend to observe correlations between categorical and undesirable features, which result in degraded performances.

Federated Learning Image Classification

Target-Free Text-guided Image Manipulation

no code implementations26 Nov 2022 Wan-Cyuan Fan, Cheng-Fu Yang, Chiao-An Yang, Yu-Chiang Frank Wang

We tackle the problem of target-free text-guided image manipulation, which requires one to modify the input reference image based on the given text instruction, while no ground truth target image is observed during training.

counterfactual Image Manipulation

Paraphrasing Is All You Need for Novel Object Captioning

no code implementations25 Sep 2022 Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang

Since no ground truth captions are available for novel object images during training, our P2C leverages cross-modality (image-text) association modules to ensure the above caption characteristics can be properly preserved.

Language Modelling Object

Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond

1 code implementation30 Aug 2022 Cheng-Yen Hsieh, Chih-Jung Chang, Fu-En Yang, Yu-Chiang Frank Wang

In particular, we present a cross-scale patch-level correlation learning in SS-PRL, which allows the model to aggregate and associate information learned across patch scales.

Instance Segmentation Multi-Label Classification +5

Learning Facial Liveness Representation for Domain Generalized Face Anti-spoofing

no code implementations16 Aug 2022 Zih-Ching Chen, Lin-Hsi Tsao, Chin-Lun Fu, Shang-Fu Chen, Yu-Chiang Frank Wang

Face anti-spoofing (FAS) aims at distinguishing face spoof attacks from the authentic ones, which is typically approached by learning proper models for performing the associated classification task.

Face Anti-Spoofing

Scene Graph Expansion for Semantics-Guided Image Outpainting

no code implementations CVPR 2022 Chiao-An Yang, Cheng-Yo Tan, Wan-Cyuan Fan, Cheng-Fu Yang, Meng-Lin Wu, Yu-Chiang Frank Wang

In particular, we propose a novel network of Scene Graph Transformer (SGT), which is designed to take node and edge features as inputs for modeling the associated structural information.

Image Outpainting

NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

1 code implementation CVPR 2022 Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang

We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance.

Novel View Synthesis

Domain-Generalized Textured Surface Anomaly Detection

no code implementations23 Mar 2022 Shang-Fu Chen, Yu-Min Liu, Chia-Ching Lin, Trista Pei-Chun Chen, Yu-Chiang Frank Wang

By observing normal and abnormal surface data across multiple source domains, our model is expected to be generalized to an unseen textured surface of interest, in which only a small number of normal data can be observed during testing.

Anomaly Detection Domain Generalization +1

Meta-Learned Feature Critics for Domain Generalized Semantic Segmentation

no code implementations27 Dec 2021 Zu-Yun Shiau, Wei-Wei Lin, Ci-Siang Lin, Yu-Chiang Frank Wang

How to handle domain shifts when recognizing or segmenting visual data across domains has been studied by learning and vision communities.

Disentanglement Domain Generalization +3

A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation

no code implementations2 Nov 2021 Yuan-Hao Lee, Fu-En Yang, Yu-Chiang Frank Wang

Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.

Few-Shot Semantic Segmentation Meta-Learning +2

Learning Visual-Linguistic Adequacy, Fidelity, and Fluency for Novel Object Captioning

no code implementations29 Sep 2021 Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Yu-Chiang Frank Wang, Louis-Philippe Morency, Ruslan Salakhutdinov

Novel object captioning (NOC) learns image captioning models for describing objects or visual concepts which are unseen (i. e., novel) in the training captions.

Image Captioning

LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

1 code implementation CVPR 2021 Cheng-Fu Yang, Wan-Cyuan Fan, Fu-En Yang, Yu-Chiang Frank Wang

To better exploit the text input, so that implicit objects or relationships can be properly inferred during layout generation, we propose a LayoutTransformer Network (LT-Net) in this paper.

Decoder Diversity

LayoutTransformer: Relation-Aware Scene Layout Generation

no code implementations1 Jan 2021 Cheng-Fu Yang, Wan-Cyuan Fan, Fu-En Yang, Yu-Chiang Frank Wang

In the areas of machine learning and computer vision, text-to-image synthesis aims at producing image outputs given the input text.

Image Generation Object +1

Representation Decomposition for Image Manipulation and Beyond

no code implementations2 Nov 2020 Shang-Fu Chen, Jia-Wei Yan, Ya-Fan Su, Yu-Chiang Frank Wang

Representation disentanglement aims at learning interpretable features, so that the output can be recovered or manipulated accordingly.

Attribute Disentanglement +1

Semantics-Guided Representation Learning with Applications to Visual Synthesis

no code implementations21 Oct 2020 Jia-Wei Yan, Ci-Siang Lin, Fu-En Yang, Yu-Jhe Li, Yu-Chiang Frank Wang

Learning interpretable and interpolatable latent representations has been an emerging research direction, allowing researchers to understand and utilize the derived latent space for further applications such as visual synthesis or recognition.

Representation Learning

Domain Generalized Person Re-Identification via Cross-Domain Episodic Learning

no code implementations19 Oct 2020 Ci-Siang Lin, Yuan-Chia Cheng, Yu-Chiang Frank Wang

That is, while a number of labeled source-domain datasets are available, we do not have access to any target-domain training data.

Domain Generalization Generalizable Person Re-identification +1

Learning to Learn in a Semi-Supervised Fashion

no code implementations ECCV 2020 Yun-Chun Chen, Chao-Te Chou, Yu-Chiang Frank Wang

To address semi-supervised learning from both labeled and unlabeled data, we present a novel meta-learning scheme.

Image Retrieval Meta-Learning +3

Wavelet Channel Attention Module with a Fusion Network for Single Image Deraining

no code implementations17 Jul 2020 Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang

Wavelet transform and the inverse wavelet transform are substituted for down-sampling and up-sampling so feature maps from the wavelet transform and convolutions contain different frequencies and scales.

Single Image Deraining

Transforming Multi-Concept Attention into Video Summarization

no code implementations2 Jun 2020 Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang

Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over a lengthy video input.

Diversity Video Summarization

Learning Resolution-Invariant Deep Representations for Person Re-Identification

no code implementations25 Jul 2019 Yun-Chun Chen, Yu-Jhe Li, Xiaofei Du, Yu-Chiang Frank Wang

Moreover, the extension of our model for semi-supervised re-ID further confirms the scalability of our proposed method for real-world scenarios and applications.

Image Super-Resolution Person Re-Identification

Dual-modality seq2seq network for audio-visual event localization

2 code implementations20 Feb 2019 Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang

Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).

audio-visual event localization

3D Shape Reconstruction from a Single 2D Image via 2D-3D Self-Consistency

no code implementations29 Nov 2018 Yi-Lun Liao, Yao-Cheng Yang, Yu-Chiang Frank Wang

Aiming at inferring 3D shapes from 2D images, 3D shape reconstruction has drawn huge attention from researchers in computer vision and deep learning communities.

3D Reconstruction 3D Shape Reconstruction From A Single 2D Image +1

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

1 code implementation NeurIPS 2018 Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang

We present a novel and unified deep learning framework which is capable of learning domain-invariant representation from data across multiple domains.

Translation Unsupervised Domain Adaptation

Summarizing First-Person Videos from Third Persons' Points of View

no code implementations ECCV 2018 Hsuan-I Ho, Wei-Chen Chiu, Yu-Chiang Frank Wang

Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage.

Deep Generative Models for Weakly-Supervised Multi-Label Classification

no code implementations ECCV 2018 Hong-Min Chu, Chih-Kuan Yeh, Yu-Chiang Frank Wang

In order to train learning models for multi-label classification (MLC), it is typically desirable to have a large amount of fully annotated multi-label data.

Classification General Classification +1

Summarizing First-Person Videos from Third Persons' Points of Views

no code implementations ECCV 2018 Hsuan-I Ho, Wei-Chen Chiu, Yu-Chiang Frank Wang

Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage.

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

1 code implementation CVPR 2018 Chung-Wei Lee, Wei Fang, Chih-Kuan Yeh, Yu-Chiang Frank Wang

In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance.

General Classification Knowledge Graphs +3

Order-Free RNN with Visual Attention for Multi-Label Classification

1 code implementation18 Jul 2017 Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, Yu-Chiang Frank Wang

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification.

Classification General Classification +2

Learning Deep Latent Spaces for Multi-Label Classification

1 code implementation3 Jul 2017 Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, Yu-Chiang Frank Wang

Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance.

Classification General Classification +1

Generative-Discriminative Variational Model for Visual Recognition

no code implementations7 Jun 2017 Chih-Kuan Yeh, Yao-Hung Hubert Tsai, Yu-Chiang Frank Wang

In other words, our GDVM casts the supervised learning task as a generative learning process, with data discrimination to be jointly exploited for improved classification.

Classification General Classification +3

Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation

no code implementations CVPR 2018 Yen-Cheng Liu, Yu-Ying Yeh, Tzu-Chien Fu, Sheng-De Wang, Wei-Chen Chiu, Yu-Chiang Frank Wang

While representation learning aims to derive interpretable features for describing visual data, representation disentanglement further results in such features so that particular image attributes can be identified and manipulated.

Attribute Disentanglement +2

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

9 code implementations ICCV 2017 Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun

Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases.

Segmentation Semantic Segmentation

Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation

no code implementations CVPR 2016 Yao-Hung Hubert Tsai, Yi-Ren Yeh, Yu-Chiang Frank Wang

With the goal of deriving a domain-invariant feature subspace for HDA, our CDLS is able to identify representative cross-domain data, including the unlabeled ones in the target domain, for performing adaptation.

Domain Adaptation

Propagated Image Filtering

no code implementations CVPR 2015 Jen-Hao Rick Chang, Yu-Chiang Frank Wang

In this paper, we propose the propagation filter as a novel image filtering operator, with the goal of smoothing over neighboring image pixels while preserving image context like edges or textural regions.

Image Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.