no code implementations • 19 Sep 2024 • Tsung-Han Wu, Hung-Ting Su, Shang-Tse Chen, Winston H. Hsu
The robust self-training (RST) framework has emerged as a prominent approach for semi-supervised adversarial training.
1 code implementation • 19 Sep 2024 • Tsung-Han Wu, Joseph E. Gonzalez, Trevor Darrell, David M. Chan
The Automated Audio Captioning (AAC) task asks models to generate natural language descriptions of an audio input.
1 code implementation • 18 Jul 2024 • Tsung-Han Wu, Giscard Biamby, Jerome Quenum, Ritwik Gupta, Joseph E. Gonzalez, Trevor Darrell, David M. Chan
MIRAGE demonstrates up to 13% performance improvement over existing open-source LMMs on VHs, sets a new state-of-the-art on the RetVQA multi-image QA benchmark, and achieves competitive performance on single-image QA with state-of-the-art LMMs.
no code implementations • CVPR 2024 • Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell
Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-vocabulary language grounding and segmentation but can suffer under false premises when queries imply the existence of something that is not actually present in the image.
no code implementations • 13 Dec 2023 • Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell
Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-vocabulary language grounding and segmentation but can suffer under false premises when queries imply the existence of something that is not actually present in the image.
1 code implementation • CVPR 2024 • Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell
Steered by an LLM controller, SLD turns text-to-image generation into an iterative closed-loop process, ensuring correctness in the resulting image.
1 code implementation • 5 Oct 2023 • Tsung-Lin Tsou, Tsung-Han Wu, Winston H. Hsu
In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA).
no code implementations • 29 Mar 2023 • Yi-Syuan Liou, Tsung-Han Wu, Jia-Fong Yeh, Wen-Chin Chen, Winston H. Hsu
MuRAL identifies informative regions of various scales to reduce annotation costs for well-learned objects and improve training performance.
1 code implementation • 16 Dec 2022 • Ru-Fen Jheng, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu
Thus, we present a novel task named free-form 3D scene inpainting.
1 code implementation • 8 Oct 2022 • Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu
While recent large-scale video-language pre-training made great progress in video question answering, the design of spatial modeling of video-language models is less fine-grained than that of image-language models; existing practices of temporal modeling also suffer from weak and noisy alignment between modalities.
1 code implementation • 27 Sep 2022 • Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin Chen, Winston H. Hsu
To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches.
no code implementations • 22 Sep 2022 • Tsung-Han Wu, Hung-Ting Su, Shang-Tse Chen, Winston H. Hsu
Fairness and robustness play vital roles in trustworthy machine learning.
1 code implementation • CVPR 2022 • Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu
Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers.
3D Object Detection From Monocular Images
Autonomous Driving
+3
1 code implementation • 14 Feb 2022 • Tsung-Han Wu, Yi-Syuan Liou, Shao-Ji Yuan, Hsin-Ying Lee, Tung-I Chen, Kuan-Chih Huang, Winston H. Hsu
In the field of domain adaptation, a trade-off exists between the model performance and the number of target domain annotations.
no code implementations • 29 Nov 2021 • Guan-Rong Lu, Yueh-Cheng Liu, Tung-I Chen, Hung-Ting Su, Tsung-Han Wu, Winston H. Hsu
We design a new Masked Gradient Update (MGU) module to generate auxiliary data along the boundary of in-distribution data points.
1 code implementation • ICCV 2021 • Tsung-Han Wu, Yueh-Cheng Liu, Yu-Kai Huang, Hsin-Ying Lee, Hung-Ting Su, Ping-Chia Huang, Winston H. Hsu
Despite the success of deep learning on supervised point cloud semantic segmentation, obtaining large-scale point-by-point manual annotations is still a significant challenge.
no code implementations • CVPR 2021 • Yu-Kai Huang, Yueh-Cheng Liu, Tsung-Han Wu, Hung-Ting Su, Yu-Cheng Chang, Tsung-Lin Tsou, Yu-An Wang, Winston H. Hsu
Dense depth estimation plays a key role in multiple applications such as robotics, 3D reconstruction, and augmented reality.
no code implementations • 3 Mar 2021 • Yu-Kai Huang, Yueh-Cheng Liu, Tsung-Han Wu, Hung-Ting Su, Yu-Cheng Chang, Tsung-Lin Tsou, Yu-An Wang, Winston H. Hsu
Dense depth estimation plays a key role in multiple applications such as robotics, 3D reconstruction, and augmented reality.
1 code implementation • 31 Oct 2020 • Yen-Hao Chen, Da-Yi Wu, Tsung-Han Wu, Hung-Yi Lee
With a proper activation as an information bottleneck on content embeddings, the trade-off between the synthesis quality and the speaker similarity of the converted speech is improved drastically.
Audio and Speech Processing Sound
no code implementations • 9 Jun 2020 • Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, Po-Han Chi, Hung-Yi Lee
In this paper, we seek solutions for reducing the computation complexity of transformer-based models for speech representation learning.
4 code implementations • 18 May 2020 • Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, Chun-Cheng Hsieh, Yen-Hao Chen, Shang-Wen Li, Hung-Yi Lee
We use the representations with two downstream tasks, speaker identification, and phoneme classification.
no code implementations • 24 Apr 2020 • Yu-Kai Huang, Yueh-Cheng Liu, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu
The performance of image based stereo estimation suffers from lighting variations, repetitive patterns and homogeneous appearance.
no code implementations • 25 Jan 2020 • Wei-Tsung Kao, Tsung-Han Wu, Po-Han Chi, Chun-Cheng Hsieh, Hung-Yi Lee
Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box.
3 code implementations • 22 Aug 2019 • Yu-Kai Huang, Tsung-Han Wu, Yueh-Cheng Liu, Winston H. Hsu
We utilize self-attention mechanism, previously used in image inpainting fields, to extract more useful information in each layer of convolution so that the complete depth map is enhanced.
Ranked #2 on
Depth Completion
on Matterport3D