Search Results for author: Yi-Hsuan Tsai

Found 67 papers, 39 papers with code

Colorization of Depth Map via Disentanglement

1 code implementation ECCV 2020 Chung-Sheng Lai, Zunzhi You, Ching-Chun Huang, Yi-Hsuan Tsai, Wei-Chen Chiu

Vision perception is one of the most important components for a computer or robot to understand the surrounding scene and achieve autonomous applications.

Colorization Disentanglement

Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation

no code implementations29 Sep 2024 Jingyi Xu, Hieu Le, Zhixin Shu, Yang Wang, Yi-Hsuan Tsai, Dimitris Samaras

The training signals for this predictor are obtained through our emotion-agnostic intensity pseudo-labeling method without the need of frame-wise intensity labeling.

Talking Head Generation

Self-training Room Layout Estimation via Geometry-aware Ray-casting

no code implementations21 Jul 2024 Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Jonathan Lee, Yi-Hsuan Tsai, Min Sun

In this paper, we introduce a novel geometry-aware self-training framework for room layout estimation models on unseen scenes with unlabeled data.

Room Layout Estimation

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

1 code implementation9 Jul 2024 Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

Recent work on image content manipulation based on vision-language pre-training models has been effectively extended to text-driven 3D scene editing.

3D Object Editing 3D Reconstruction +4

Gaga: Group Any Gaussians via 3D-aware Memory Bank

no code implementations11 Apr 2024 Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang

We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models.

Scene Segmentation Scene Understanding +3

Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance

1 code implementation12 Dec 2023 Kuan-Chih Huang, Yi-Hsuan Tsai, Ming-Hsuan Yang

Finally, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.

3D Object Detection object-detection

Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

1 code implementation NeurIPS 2023 Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, Yi-Hsuan Tsai

Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations.

3D Object Detection Denoising +5

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

1 code implementation CVPR 2024 Chi-Hsi Kung, Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen

In this paper, we introduce Action-slot, a slot attention-based approach that learns visual action-centric representations, capturing both motion and contextual information.

Action Recognition

Editing 3D Scenes via Text Prompts without Retraining

no code implementations10 Sep 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining.

3D scene Editing 3D Scene Reconstruction +2

Delving into Motion-Aware Matching for Monocular 3D Object Tracking

1 code implementation ICCV 2023 Kuan-Chih Huang, Ming-Hsuan Yang, Yi-Hsuan Tsai

In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches.

3D Multi-Object Tracking 3D Object Tracking +3

Multimodal Prompting with Missing Modalities for Visual Recognition

2 code implementations CVPR 2023 Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, Chen-Yu Lee

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models.

Learning Object-level Point Augmentor for Semi-supervised 3D Object Detection

1 code implementation19 Dec 2022 Cheng-Ju Ho, Chen-Hsuan Tai, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.

3D Object Detection Knowledge Distillation +4

Learning Phase Mask for Privacy-Preserving Passive Depth Estimation

no code implementations European Conference on Computer Vision (ECCV) 2022 Zaid Tasneem, Giovanni Milione, Yi-Hsuan Tsai, Xiang Yu, Ashok Veeraraghavan, Manmohan Chandraker, Francesco Pittaluga

With over a billion sold each year, cameras are not only becoming ubiquitous and omnipresent, but are driving progress in a wide range of applications such as augmented/virtual reality, robotics, surveillance, security, autonomous navigation and many others.

Autonomous Navigation Depth Estimation +2

360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning

1 code implementation24 Oct 2022 Bolivar Solarte, Chin-Hsuan Wu, Yueh-Cheng Liu, Yi-Hsuan Tsai, Min Sun

In addition, since ground truth annotations are not available during training nor in testing, we leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene, allowing us to evaluate any layout estimator for hyper-parameter tuning, including model selection without ground truth annotations.

Model Selection Pseudo Label

3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling

1 code implementation19 Sep 2022 Yu-Ting Yen, Chia-Ni Lu, Wei-Chen Chiu, Yi-Hsuan Tsai

In this paper, we develop a domain adaptation framework via generating reliable pseudo ground truths of depth from real data to provide direct supervisions.

Monocular Depth Estimation Point Cloud Completion +1

On Generalizing Beyond Domains in Cross-Domain Continual Learning

no code implementations CVPR 2022 Christian Simon, Masoud Faraki, Yi-Hsuan Tsai, Xiang Yu, Samuel Schulter, Yumin Suh, Mehrtash Harandi, Manmohan Chandraker

Humans have the ability to accumulate knowledge of new tasks in varying conditions, but deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.

Continual Learning Knowledge Distillation

Learning Semantic Segmentation from Multiple Datasets with Label Shifts

no code implementations28 Feb 2022 Dongwan Kim, Yi-Hsuan Tsai, Yumin Suh, Masoud Faraki, Sparsh Garg, Manmohan Chandraker, Bohyung Han

First, a gradient conflict in training due to mismatched label spaces is identified and a class-independent binary cross-entropy loss is proposed to alleviate such label conflicts.

Diversity Semantic Segmentation

Self-Supervised Feature Learning from Partial Point Clouds via Pose Disentanglement

no code implementations9 Jan 2022 Meng-Shiun Tsai, Pei-Ze Chiang, Yi-Hsuan Tsai, Wei-Chen Chiu

Self-supervised learning on point clouds has gained a lot of attention recently, since it addresses the label-efficiency and domain-gap problems on point cloud tasks.

Disentanglement Self-Supervised Learning

360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan Estimation

1 code implementation12 Dec 2021 Bolivar Solarte, Yueh-Cheng Liu, Chin-Hsuan Wu, Yi-Hsuan Tsai, Min Sun

We present 360-DFPE, a sequential floor plan estimation method that directly takes 360-images as input without relying on active sensors or 3D information.

Visual Odometry

Semi-supervised Multi-task Learning for Semantics and Depth

no code implementations14 Oct 2021 Yufeng Wang, Yi-Hsuan Tsai, Wei-Chih Hung, Wenrui Ding, Shuo Liu, Ming-Hsuan Yang

Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.

Depth Estimation Multi-Task Learning +1

Towards Interpretable Deep Networks for Monocular Depth Estimation

1 code implementation ICCV 2021 Zunzhi You, Yi-Hsuan Tsai, Wei-Chen Chiu, Guanbin Li

Based on our observations, we quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.

Monocular Depth Estimation

End-to-end Multi-modal Video Temporal Grounding

1 code implementation NeurIPS 2021 Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.

Optical Flow Estimation Self-Supervised Learning

LED2-Net: Monocular 360deg Layout Estimation via Differentiable Depth Rendering

no code implementations CVPR 2021 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.

Depth Estimation Depth Prediction +1

Robust 360-8PA: Redesigning The Normalized 8-point Algorithm for 360-FoV Images

1 code implementation22 Apr 2021 Bolivar Solarte, Chin-Hsuan Wu, Kuan-Wei Lu, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

This paper presents a novel preconditioning strategy for the classic 8-point algorithm (8-PA) for estimating an essential matrix from 360-FoV images (i. e., equirectangular images) in spherical projection.

Understanding Synonymous Referring Expressions via Contrastive Features

1 code implementation20 Apr 2021 Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.

Object Referring Expression +3

LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering

1 code implementation1 Apr 2021 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.

3D Room Layouts From A Single RGB Panorama Depth Estimation +2

Cross-Domain Similarity Learning for Face Recognition in Unseen Domains

no code implementations CVPR 2021 Masoud Faraki, Xiang Yu, Yi-Hsuan Tsai, Yumin Suh, Manmohan Chandraker

Intuitively, it discriminatively correlates explicit metrics derived from one domain, with triplet samples from another domain in a unified loss function to be minimized within a network, which leads to better alignment of the training domains.

Face Recognition Metric Learning +1

Dual-Stream Fusion Network for Spatiotemporal Video Super-Resolution

1 code implementation Winter Conference on Applications of Computer Vision (WACV) 2021 Min-Yuan Tseng, Yen-Chung Chen, Yi-Lun Lee, Wei-Sheng Lai, Yi-Hsuan Tsai, Wei-Chen Chiu

Our method is based on an important observation that: even the direct cascade of prior research in spatial and temporal super-resolution can achieve the spatiotemporal upsampling, changing orders for combining them would lead to results with a complementary property.

Image Super-Resolution Video Super-Resolution

Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector

1 code implementation ECCV 2020 Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

A domain adaptive object detector aims to adapt itself to unseen domains that may contain variations of object appearance, viewpoints or backgrounds.

Domain Adaptation

Object Detection with a Unified Label Space from Multiple Datasets

no code implementations ECCV 2020 Xiangyun Zhao, Samuel Schulter, Gaurav Sharma, Yi-Hsuan Tsai, Manmohan Chandraker, Ying Wu

To address this challenge, we design a framework which works with such partial annotations, and we exploit a pseudo labeling approach that we adapt for our specific case.

Object object-detection +1

Learning to Caricature via Semantic Shape Transform

1 code implementation12 Aug 2020 Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Yu-Ting Chang, Yijun Li, Deng Cai, Ming-Hsuan Yang

Caricature is an artistic drawing created to abstract or exaggerate facial features of a person.

Caricature

Domain Adaptive Semantic Segmentation Using Weak Labels

no code implementations ECCV 2020 Sujoy Paul, Yi-Hsuan Tsai, Samuel Schulter, Amit K. Roy-Chowdhury, Manmohan Chandraker

In this work, we propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain.

Segmentation Semantic Segmentation +1

Regularizing Meta-Learning via Gradient Dropout

1 code implementation13 Apr 2020 Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.

Domain Generalization Meta-Learning +1

LayoutMP3D: Layout Annotation of Matterport3D

1 code implementation30 Mar 2020 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Inferring the information of 3D layout from a single equirectangular panorama is crucial for numerous applications of virtual reality or robotics (e. g., scene understanding and navigation).

Scene Understanding

Adversarial Learning of Privacy-Preserving and Task-Oriented Representations

no code implementations22 Nov 2019 Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, Ming-Hsuan Yang

For instance, there could be a potential privacy risk of machine learning systems via the model inversion attack, whose goal is to reconstruct the input data from the latent representation of deep networks.

Attribute BIG-bench Machine Learning +2

Referring Expression Object Segmentation with Caption-Aware Consistency

1 code implementation10 Oct 2019 Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang

To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.

Caption Generation Object +4

Adaptation Across Extreme Variations using Unlabeled Domain Bridges

no code implementations5 Jun 2019 Shuyang Dai, Kihyuk Sohn, Yi-Hsuan Tsai, Lawrence Carin, Manmohan Chandraker

We tackle an unsupervised domain adaptation problem for which the domain discrepancy between labeled source and unlabeled target domains is large, due to many factors of inter and intra-domain variation.

Object Recognition Semantic Segmentation +1

Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence

1 code implementation CVPR 2019 Hsueh-Ying Lai, Yi-Hsuan Tsai, Wei-Chen Chiu

In this paper, we propose a single and principled network to jointly learn spatiotemporal correspondence for stereo matching and flow estimation, with a newly designed geometric connection as the unsupervised signal for temporally adjacent stereo pairs.

Optical Flow Estimation Scene Understanding +2

Weakly-supervised Caricature Face Parsing through Domain Adaptation

1 code implementation13 May 2019 Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Deng Cai, Ming-Hsuan Yang

However, current state-of-the-art face parsing methods require large amounts of labeled data on the pixel-level and such process for caricature is tedious and labor-intensive.

Attribute Caricature +3

Domain Adaptation for Structured Output via Disentangled Patch Representations

no code implementations ICLR 2019 Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker

To this end, we propose to learn discriminative feature representations of patches based on label histograms in the source domain, through the construction of a disentangled space.

Domain Adaptation Semantic Segmentation

Active Adversarial Domain Adaptation

no code implementations16 Apr 2019 Jong-Chyi Su, Yi-Hsuan Tsai, Kihyuk Sohn, Buyu Liu, Subhransu Maji, Manmohan Chandraker

Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling for adapting models across domains.

Active Learning Diversity +4

3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization

1 code implementation5 Apr 2019 Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception.

Depth Completion Stereo-LiDAR Fusion +2

Domain Adaptation for Structured Output via Discriminative Patch Representations

8 code implementations ICCV 2019 Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker

Predicting structured outputs such as semantic segmentation relies on expensive per-pixel annotations to learn supervised models like convolutional neural networks.

Domain Adaptation Segmentation +2

Unseen Object Segmentation in Videos via Transferable Representations

no code implementations8 Jan 2019 Yi-Wen Chen, Yi-Hsuan Tsai, Chu-Ya Yang, Yen-Yu Lin, Ming-Hsuan Yang

The entire process is decomposed into two tasks: 1) solving a submodular function for selecting object-like segments, and 2) learning a CNN model with a transferable module for adapting seen categories in the source domain to the unseen target video.

Object Segmentation +1

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation

2 code implementations20 Dec 2018 Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input.

Depth Estimation Depth Prediction

Learning Video-Story Composition via Recurrent Neural Network

no code implementations31 Jan 2018 Guangyu Zhong, Yi-Hsuan Tsai, Sifei Liu, Zhixun Su, Ming-Hsuan Yang

In this paper, we propose a learning-based method to compose a video-story from a group of video clips that describe an activity or experience.

Learning Binary Residual Representations for Domain-specific Video Streaming

no code implementations14 Dec 2017 Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz

Specifically, we target a streaming setting where the videos to be streamed from a server to a client are all in the same domain and they have to be compressed to a small size for low-latency transmission.

Video Compression

Scene Parsing with Global Context Embedding

1 code implementation ICCV 2017 Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang

We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models.

Scene Parsing

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow

1 code implementation ICCV 2017 Jingchun Cheng, Yi-Hsuan Tsai, Shengjin Wang, Ming-Hsuan Yang

This paper proposes an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos.

Image Segmentation Object +7

Learning to Segment Instances in Videos with Spatial Propagation Network

no code implementations14 Sep 2017 Jingchun Cheng, Sifei Liu, Yi-Hsuan Tsai, Wei-Chih Hung, Shalini De Mello, Jinwei Gu, Jan Kautz, Shengjin Wang, Ming-Hsuan Yang

In addition, we apply a filter on the refined score map that aims to recognize the best connected region using spatial and temporal consistencies in the video.

Object Segmentation +1

Video Segmentation via Object Flow

no code implementations CVPR 2016 Yi-Hsuan Tsai, Ming-Hsuan Yang, Michael J. Black

Video object segmentation is challenging due to fast moving objects, deforming shapes, and cluttered backgrounds.

Ranked #74 on Semi-Supervised Video Object Segmentation on DAVIS 2016 (using extra training data)

Object Optical Flow Estimation +5

Adaptive Region Pooling for Object Detection

no code implementations CVPR 2015 Yi-Hsuan Tsai, Onur C. Hamsici, Ming-Hsuan Yang

Learning models for object detection is a challenging problem due to the large intra-class variability of objects in appearance, viewpoints, and rigidity.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.