Search Results for author: Chih-Yao Ma

Found 16 papers, 9 papers with code

Cross-Domain Adaptive Teacher for Object Detection

1 code implementation25 Nov 2021 Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda

To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.

Data Augmentation Domain Adaptation +1

CrossMatch: Improving Semi-Supervised Object Detection via Multi-Scale Consistency

no code implementations29 Sep 2021 Zhuoran Yu, Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

Inspired by the fact that teacher/student pseudo-labeling approaches result in a weak and sparse gradient signal due to the difficulty of confidence-thresholding, CrossMatch leverages \textit{multi-scale feature extraction} in object detection.

Object Detection Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection

2 code implementations ICLR 2021 Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.

Image Classification Object Detection +2

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

2 code implementations ECCV 2020 Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

Recent state-of-the-art semi-supervised learning (SSL) methods use a combination of image-based transformations and consistency regularization as core components.

Data Augmentation Semi-Supervised Image Classification

Frustratingly Simple Domain Generalization via Image Stylization

2 code implementations19 Jun 2020 Nathan Somavarapu, Chih-Yao Ma, Zsolt Kira

Convolutional Neural Networks (CNNs) show impressive performance in the standard classification setting where training and testing data are drawn i. i. d.

Domain Generalization Image Stylization

Who2com: Collaborative Perception via Learnable Handshake Communication

no code implementations21 Mar 2020 Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, Zsolt Kira

In this paper, we propose the problem of collaborative perception, where robots can combine their local observations with those of neighboring agents in a learnable way to improve accuracy on a perception task.

Multi-agent Reinforcement Learning Scene Understanding +1

Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification

no code implementations12 Jun 2019 Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

We then show that when combined with these regularizers, the proposed method facilitates the propagation of information from generated prototypes to image data to further improve results.

General Classification Semi-Supervised Image Classification

Learning to Generate Grounded Visual Captions without Localization Supervision

2 code implementations1 Jun 2019 Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira

When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the model is hallucinating based on priors in the dataset and/or the language model.

Image Captioning Language Modelling +1

The Regretful Navigation Agent for Vision-and-Language Navigation

1 code implementation CVPR 2019 (Oral) 2019 Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Decision Making Vision and Language Navigation +2

Grounded Objects and Interactions for Video Captioning

no code implementations16 Nov 2017 Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf

We address the problem of video captioning by grounding language generation on object interactions in the video.

Scene Understanding Text Generation +2

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

4 code implementations30 Mar 2017 Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

Action Classification Action Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.