Search Results for author: Chih-Yao Ma

Found 26 papers, 13 papers with code

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

no code implementations • 15 Apr 2024 • Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng

These two problems are further reinforced with the use of pixel-distance losses.

Paper
Add Code

ControlRoom3D: Room Generation using Semantic Proxy Rooms

no code implementations • 8 Dec 2023 • Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou

Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style.

Paper
Add Code

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh

Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.

Image Generation

Paper
Add Code

Trainable Projected Gradient Method for Robust Fine-tuning

2 code implementations • CVPR 2023 • Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira

To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.

Transfer Learning

Paper
Code

RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data

1 code implementation • 28 Feb 2023 • Sangwoo Mo, Jong-Chyi Su, Chih-Yao Ma, Mido Assran, Ishan Misra, Licheng Yu, Sean Bell

Semi-supervised learning aims to train a model using limited labels.

Density Estimation Image Classification +1

488

Paper
Code

When does the student surpass the teacher? Federated Semi-supervised Learning with Teacher-Student EMA

no code implementations • 24 Jan 2023 • Jessica Zhao, Sayan Ghosh, Akash Bharadwaj, Chih-Yao Ma

Semi-Supervised Learning (SSL) has received extensive attention in the domain of computer vision, leading to development of promising approaches such as FixMatch.

Federated Learning Image Classification +1

Paper
Add Code

Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation

no code implementations • 20 Nov 2022 • Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira

In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on.

Test unseen Vision and Language Navigation

Paper
Add Code

Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

no code implementations • 7 Oct 2022 • Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira

Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using ~10% of their trainable parameters.

Paper
Add Code

Open-Set Semi-Supervised Object Detection

no code implementations • 29 Aug 2022 • Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.

Object object-detection +3

Paper
Add Code

Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors

1 code implementation • CVPR 2022 • Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

In this paper, we present Unbiased Teacher v2, which shows the generalization of SS-OD method to anchor-free detectors and also introduces Listen2Student mechanism for the unsupervised regression loss.

Ranked #1 on Semi-Supervised Object Detection on COCO 0.5% labeled data

Object Detection regression +1

Paper
Code

Cross-Domain Adaptive Teacher for Object Detection

2 code implementations • CVPR 2022 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda

To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.

Data Augmentation Domain Adaptation +3

170

Paper
Code

Adaptive Unbiased Teacher for Cross-Domain Object Detection

no code implementations • 29 Sep 2021 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris M. Kitani, Peter Vajda

This enables the student model to capture domain-invariant features.

Data Augmentation Domain Adaptation +3

Paper
Add Code

CrossMatch: Improving Semi-Supervised Object Detection via Multi-Scale Consistency

no code implementations • 29 Sep 2021 • Zhuoran Yu, Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

Inspired by the fact that teacher/student pseudo-labeling approaches result in a weak and sparse gradient signal due to the difficulty of confidence-thresholding, CrossMatch leverages \textit{multi-scale feature extraction} in object detection.

Object object-detection +2

Paper
Add Code

Unbiased Teacher for Semi-Supervised Object Detection

4 code implementations • ICLR 2021 • Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.

Ranked #2 on Semi-Supervised Person Bounding Box Detection on COCO 1% labeled data

Image Classification Object +4

409

Paper
Code

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

2 code implementations • ECCV 2020 • Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

Recent state-of-the-art semi-supervised learning (SSL) methods use a combination of image-based transformations and consistency regularization as core components.

Ranked #1 on Semi-Supervised Image Classification on Mini-ImageNet, 10000 Labels

Clustering Data Augmentation +1

Paper
Code

Frustratingly Simple Domain Generalization via Image Stylization

2 code implementations • 19 Jun 2020 • Nathan Somavarapu, Chih-Yao Ma, Zsolt Kira

Convolutional Neural Networks (CNNs) show impressive performance in the standard classification setting where training and testing data are drawn i. i. d.

Ranked #48 on Domain Generalization on PACS

Domain Generalization Image Stylization

Paper
Code

Who2com: Collaborative Perception via Learnable Handshake Communication

1 code implementation • 21 Mar 2020 • Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, Zsolt Kira

In this paper, we propose the problem of collaborative perception, where robots can combine their local observations with those of neighboring agents in a learnable way to improve accuracy on a perception task.

Multi-agent Reinforcement Learning Scene Understanding +1

119

Paper
Code

Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification

no code implementations • 12 Jun 2019 • Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

We then show that when combined with these regularizers, the proposed method facilitates the propagation of information from generated prototypes to image data to further improve results.

Classification General Classification +1

Paper
Add Code

Learning to Generate Grounded Visual Captions without Localization Supervision

2 code implementations • 1 Jun 2019 • Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira

When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the model is hallucinating based on priors in the dataset and/or the language model.

Image Captioning Language Modelling +2

155

Paper
Code

The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation

3 code implementations • CVPR 2019 • Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Ranked #115 on Vision and Language Navigation on VLN Challenge

Decision Making Vision and Language Navigation +2

123

Paper
Code

The Regretful Navigation Agent for Vision-and-Language Navigation

1 code implementation • CVPR 2019 (Oral) 2019 • Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Decision Making Vision and Language Navigation +2

123

Paper
Code

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

2 code implementations • ICLR 2019 • Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

Ranked #115 on Vision and Language Navigation on VLN Challenge

Natural Language Visual Grounding Vision and Language Navigation +2

117

Paper
Code

AdaFrame: Adaptive Frame Selection for Fast Video Recognition

no code implementations • CVPR 2019 • Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S. Davis

We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition.

Policy Gradient Methods Video Recognition

Paper
Add Code

Attend and Interact: Higher-Order Object Interactions for Video Understanding

no code implementations • CVPR 2018 • Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf

Human actions often involve complex interactions across several inter-related objects in the scene.

Action Classification Action Recognition +8

Paper
Add Code

Grounded Objects and Interactions for Video Captioning

no code implementations • 16 Nov 2017 • Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf

We address the problem of video captioning by grounding language generation on object interactions in the video.

Object Scene Understanding +3

Paper
Add Code

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

4 code implementations • 30 Mar 2017 • Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

Ranked #54 on Action Recognition on UCF101

Action Classification Action Recognition +3

844

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.