Search Results for author: Tsung-Han Wu

Found 20 papers, 10 papers with code

See, Say, and Segment: Teaching LMMs to Overcome False Premises

no code implementations13 Dec 2023 Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell

Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-vocabulary language grounding and segmentation but can suffer under false premises when queries imply the existence of something that is not actually present in the image.

Self-correcting LLM-controlled Diffusion Models

no code implementations27 Nov 2023 Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell

Steered by an LLM controller, SLD turns text-to-image generation into an iterative closed-loop process, ensuring correctness in the resulting image.

Attribute Text-to-Image Generation

WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection

1 code implementation5 Oct 2023 Tsung-Lin Tsou, Tsung-Han Wu, Winston H. Hsu

In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA).

3D Object Detection object-detection +1

MuRAL: Multi-Scale Region-based Active Learning for Object Detection

no code implementations29 Mar 2023 Yi-Syuan Liou, Tsung-Han Wu, Jia-Fong Yeh, Wen-Chin Chen, Winston H. Hsu

MuRAL identifies informative regions of various scales to reduce annotation costs for well-learned objects and improve training performance.

Active Learning Object +2

Free-form 3D Scene Inpainting with Dual-stream GAN

1 code implementation16 Dec 2022 Ru-Fen Jheng, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

Thus, we present a novel task named free-form 3D scene inpainting.

Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling

1 code implementation8 Oct 2022 Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

While recent large-scale video-language pre-training made great progress in video question answering, the design of spatial modeling of video-language models is less fine-grained than that of image-language models; existing practices of temporal modeling also suffer from weak and noisy alignment between modalities.

Language Modelling Question Answering +1

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

1 code implementation27 Sep 2022 Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin Chen, Winston H. Hsu

To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches.

3D Object Detection Autonomous Driving +5

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

1 code implementation CVPR 2022 Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu

Moreover, different from conventional pixel-wise positional encodings, we introduce a novel depth positional encoding (DPE) to inject depth positional hints into transformers.

Autonomous Driving Monocular 3D Object Detection +2

Anomaly-Aware Semantic Segmentation by Leveraging Synthetic-Unknown Data

no code implementations29 Nov 2021 Guan-Rong Lu, Yueh-Cheng Liu, Tung-I Chen, Hung-Ting Su, Tsung-Han Wu, Winston H. Hsu

We design a new Masked Gradient Update (MGU) module to generate auxiliary data along the boundary of in-distribution data points.

Anomaly Detection Autonomous Driving +3

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

1 code implementation ICCV 2021 Tsung-Han Wu, Yueh-Cheng Liu, Yu-Kai Huang, Hsin-Ying Lee, Hung-Ting Su, Ping-Chia Huang, Winston H. Hsu

Despite the success of deep learning on supervised point cloud semantic segmentation, obtaining large-scale point-by-point manual annotations is still a significant challenge.

Active Learning Scene Understanding +1

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

1 code implementation31 Oct 2020 Yen-Hao Chen, Da-Yi Wu, Tsung-Han Wu, Hung-Yi Lee

With a proper activation as an information bottleneck on content embeddings, the trade-off between the synthesis quality and the speaker similarity of the converted speech is improved drastically.

Audio and Speech Processing Sound

Expanding Sparse Guidance for Stereo Matching

no code implementations24 Apr 2020 Yu-Kai Huang, Yueh-Cheng Liu, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu

The performance of image based stereo estimation suffers from lighting variations, repetitive patterns and homogeneous appearance.

Domain Adaptation Stereo Matching

BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT

no code implementations25 Jan 2020 Wei-Tsung Kao, Tsung-Han Wu, Po-Han Chi, Chun-Cheng Hsieh, Hung-Yi Lee

Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box.

Sentence

Indoor Depth Completion with Boundary Consistency and Self-Attention

3 code implementations22 Aug 2019 Yu-Kai Huang, Tsung-Han Wu, Yueh-Cheng Liu, Winston H. Hsu

We utilize self-attention mechanism, previously used in image inpainting fields, to extract more useful information in each layer of convolution so that the complete depth map is enhanced.

Depth Completion Depth Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.