no code implementations • 15 Apr 2024 • Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng
These two problems are further reinforced with the use of pixel-distance losses.
no code implementations • CVPR 2024 • Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou
Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style.
no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
2 code implementations • CVPR 2023 • Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira
To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.
1 code implementation • 28 Feb 2023 • Sangwoo Mo, Jong-Chyi Su, Chih-Yao Ma, Mido Assran, Ishan Misra, Licheng Yu, Sean Bell
Semi-supervised learning aims to train a model using limited labels.
no code implementations • 24 Jan 2023 • Jessica Zhao, Sayan Ghosh, Akash Bharadwaj, Chih-Yao Ma
Semi-Supervised Learning (SSL) has received extensive attention in the domain of computer vision, leading to development of promising approaches such as FixMatch.
no code implementations • 20 Nov 2022 • Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira
In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on.
no code implementations • 7 Oct 2022 • Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira
Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using ~10% of their trainable parameters.
no code implementations • 29 Aug 2022 • Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira
To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.
1 code implementation • CVPR 2022 • Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira
In this paper, we present Unbiased Teacher v2, which shows the generalization of SS-OD method to anchor-free detectors and also introduces Listen2Student mechanism for the unsupervised regression loss.
2 code implementations • CVPR 2022 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda
To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.
no code implementations • 29 Sep 2021 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris M. Kitani, Peter Vajda
This enables the student model to capture domain-invariant features.
no code implementations • 29 Sep 2021 • Zhuoran Yu, Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira
Inspired by the fact that teacher/student pseudo-labeling approaches result in a weak and sparse gradient signal due to the difficulty of confidence-thresholding, CrossMatch leverages \textit{multi-scale feature extraction} in object detection.
4 code implementations • ICLR 2021 • Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda
To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.
2 code implementations • ECCV 2020 • Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira
Recent state-of-the-art semi-supervised learning (SSL) methods use a combination of image-based transformations and consistency regularization as core components.
2 code implementations • 19 Jun 2020 • Nathan Somavarapu, Chih-Yao Ma, Zsolt Kira
Convolutional Neural Networks (CNNs) show impressive performance in the standard classification setting where training and testing data are drawn i. i. d.
Ranked #58 on Domain Generalization on PACS
1 code implementation • 21 Mar 2020 • Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, Zsolt Kira
In this paper, we propose the problem of collaborative perception, where robots can combine their local observations with those of neighboring agents in a learnable way to improve accuracy on a perception task.
Multi-agent Reinforcement Learning Reinforcement Learning +2
no code implementations • 12 Jun 2019 • Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira
We then show that when combined with these regularizers, the proposed method facilitates the propagation of information from generated prototypes to image data to further improve results.
2 code implementations • 1 Jun 2019 • Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the model is hallucinating based on priors in the dataset and/or the language model.
1 code implementation • CVPR 2019 (Oral) 2019 • Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.
3 code implementations • CVPR 2019 • Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira
As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.
Ranked #115 on Vision and Language Navigation on VLN Challenge
2 code implementations • ICLR 2019 • Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.
Ranked #115 on Vision and Language Navigation on VLN Challenge
Natural Language Visual Grounding Vision and Language Navigation +2
no code implementations • CVPR 2019 • Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S. Davis
We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition.
no code implementations • 16 Nov 2017 • Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
We address the problem of video captioning by grounding language generation on object interactions in the video.
no code implementations • CVPR 2018 • Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Human actions often involve complex interactions across several inter-related objects in the scene.
4 code implementations • 30 Mar 2017 • Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib
We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.
Ranked #54 on Action Recognition on UCF101