no code implementations • 22 Mar 2023 • Yi-Shan Lee, Wei-Cheng Tseng, Fu-En Wang, Min Sun
We propose a content-based system for matching video and background music.
no code implementations • 2 Dec 2022 • Tobias Fischer, Yung-Hsu Yang, Suryansh Kumar, Min Sun, Fisher Yu
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings.
1 code implementation • 1 Dec 2022 • YuanFu Yang, Min Sun
In this paper, we present a novel architecture that can perform defect classification in a more efficient way.
1 code implementation • 28 Nov 2022 • Fu-En Wang, Chien-Yi Wang, Min Sun, Shang-Hong Lai
In this paper, we propose MixFairFace framework to improve the fairness in face recognition models.
no code implementations • 7 Nov 2022 • YuanFu Yang, Min Sun
Photorealistic rendering of real-world scenes is a tremendous challenge with a wide range of applications, including mixed reality (MR), and virtual reality (VR).
no code implementations • 24 Oct 2022 • Bolivar Solarte, Chin-Hsuan Wu, Yueh-Cheng Liu, Yi-Hsuan Tsai, Min Sun
In addition, since ground truth annotations are not available during training nor in testing, we leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene, allowing us to evaluate any layout estimator for hyper-parameter tuning, including model selection without ground truth annotations.
1 code implementation • 7 Sep 2022 • Fu-En Wang, Yu-Hsuan Yeh, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
Thus, state-of-the-art frameworks for monocular 360 depth estimation such as bi-projection fusion in BiFuse are proposed.
Ranked #12 on
Depth Estimation
on Stanford2D3D Panoramic
1 code implementation • CVPR 2022 • YuanFu Yang, Min Sun
However, the massive expansion of semiconductor manufacturing and the development of new technology will bring many defect wafers.
1 code implementation • 5 Apr 2022 • An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang
With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.
1 code implementation • 16 Mar 2022 • Ping-Chung Yu, Cheng Sun, Min Sun
In this work, we deal with the data scarcity challenge of 3D tasks by transferring knowledge from strong 2D models via RGB-D images.
no code implementations • 1 Feb 2022 • Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, Min Sun
We propose CLA-NeRF -- a Category-Level Articulated Neural Radiance Field that can perform view synthesis, part segmentation, and articulated pose estimation.
no code implementations • 14 Dec 2021 • Wei-Cheng Tseng, Wei Wei, Da-Cheng Juan, Min Sun
The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios.
1 code implementation • 12 Dec 2021 • Bolivar Solarte, Yueh-Cheng Liu, Chin-Hsuan Wu, Yi-Hsuan Tsai, Min Sun
We present 360-DFPE, a sequential floor plan estimation method that directly takes 360-images as input without relying on active sensors or 3D information.
no code implementations • 1 Dec 2021 • Wei-Cheng Tseng, Po-Han Chi, Jia-Hua Wu, Min Sun
In contrast, most of the existing methods delete the rare protein functions to reduce the label space.
2 code implementations • CVPR 2022 • Cheng Sun, Min Sun, Hwann-Tzong Chen
Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.
no code implementations • 1 Nov 2021 • Yung-Hsu Yang, Thomas E. Huang, Min Sun, Samuel Rota Bulò, Peter Kontschieder, Fisher Yu
Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead.
no code implementations • ICCV 2021 • Chi-Wei Hsiao, Cheng Sun, Hwann-Tzong Chen, Min Sun
We present a novel pyramidal output representation to ensure parsimony with our "specialize and fuse" process for semantic segmentation.
no code implementations • NeurIPS 2021 • An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu
We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.
1 code implementation • CVPR 2021 • Cheng Sun, Chi-Wei Hsiao, Ning-Hsu Wang, Min Sun, Hwann-Tzong Chen
Indoor panorama typically consists of human-made structures parallel or perpendicular to gravity.
no code implementations • CVPR 2021 • Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai
Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.
1 code implementation • 22 Apr 2021 • Bolivar Solarte, Chin-Hsuan Wu, Kuan-Wei Lu, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai
This paper presents a novel preconditioning strategy for the classic 8-point algorithm (8-PA) for estimating an essential matrix from 360-FoV images (i. e., equirectangular images) in spherical projection.
1 code implementation • 1 Apr 2021 • Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai
Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.
3D Room Layouts From A Single RGB Panorama
Depth Estimation
+2
1 code implementation • 12 Mar 2021 • Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun
Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.
Ranked #6 on
Multiple Object Tracking
on KITTI Tracking test
no code implementations • 4 Mar 2021 • Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun
We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase.
no code implementations • 12 Dec 2020 • Chun-Hung Chao, Hsien-Tzu Cheng, Tsung-Ying Ho, Le Lu, Min Sun
The proposed method is evaluated on two published radiotherapy target contouring datasets of nasopharyngeal and esophageal cancer.
no code implementations • NeurIPS 2020 • Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun
To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar.
1 code implementation • CVPR 2021 • Cheng Sun, Min Sun, Hwann-Tzong Chen
We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat).
3D Room Layouts From A Single RGB Panorama
Depth Estimation
+1
no code implementations • 29 Aug 2020 • Chun-Hung Chao, Zhuotun Zhu, Dazhou Guo, Ke Yan, Tsung-Ying Ho, Jinzheng Cai, Adam P. Harrison, Xianghua Ye, Jing Xiao, Alan Yuille, Min Sun, Le Lu, Dakai Jin
Specifically, we first utilize a 3D convolutional neural network with ROI-pooling to extract the GTV$_{LN}$'s instance-wise appearance features.
no code implementations • ECCV 2020 • Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang
We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.
1 code implementation • 30 Mar 2020 • Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai
Inferring the information of 3D layout from a single equirectangular panorama is crucial for numerous applications of virtual reality or robotics (e. g., scene understanding and navigation).
no code implementations • 10 Jan 2020 • Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang
We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.
no code implementations • 18 Nov 2019 • Wen-Yen Chang, Wen-Huan Chiang, Shao-Hao Lu, Tingfan Wu, Min Sun
Last but not least, we investigate the generalization of the HAL policy learned on MNIST dataset by directly applying it on MNIST-M. We show that the agent can generalize and outperform directly-learned policy under constrained labeled sets.
1 code implementation • 11 Nov 2019 • Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
Recently, end-to-end trainable deep neural networks have significantly improved stereo depth estimation for perspective images.
no code implementations • 3 Oct 2019 • Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu
In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.
no code implementations • 29 May 2019 • Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen
This paper also constructs a benchmark for validating the performance on general layout topologies, where Flat2Layout achieves good performance on general room types.
1 code implementation • 5 Apr 2019 • Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception.
no code implementations • 5 Apr 2019 • Chun-Hung Chao, Yen-Chi Cheng, Hsien-Tzu Cheng, Chi-Wen Huang, Tsung-Ying Ho, Chen-Kan Tseng, Le Lu, Min Sun
Instead, inspired by the treating methodology of considering meaningful information across slices, we used Gated Graph Neural Network to frame this problem more efficiently.
3 code implementations • ICCV 2019 • Tsun-Hsuan Wang, Yen-Chi Cheng, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun
We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames.
no code implementations • 25 Mar 2019 • Fang-I Hsiao, Jui-Hsuan Kuo, Min Sun
The encoder infers discrete latent factors corresponding to different behaviors from demonstrations.
1 code implementation • CVPR 2019 • Cheng Sun, Chi-Wei Hsiao, Min Sun, Hwann-Tzong Chen
We present a new approach to the problem of estimating the 3D room layout from a single panoramic image.
3D Room Layouts From A Single RGB Panorama
Data Augmentation
2 code implementations • 20 Dec 2018 • Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input.
1 code implementation • CVPR 2019 • Shang-Ta Yang, Fu-En Wang, Chi-Han Peng, Peter Wonka, Min Sun, Hung-Kuo Chu
We present a deep learning framework, called DuLa-Net, to predict Manhattan-world 3D room layouts from a single RGB panorama.
1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.
Ranked #11 on
Multiple Object Tracking
on KITTI Tracking test
2 code implementations • 26 Nov 2018 • An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, Min Sun
Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy.
no code implementations • 13 Nov 2018 • Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, Min Sun
We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360{\deg} video.
no code implementations • 11 Sep 2018 • Cheng Kuan Chen, Zhu Feng Pan, Min Sun, Ming-Yu Liu
It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions.
no code implementations • 29 Aug 2018 • An-Chieh Cheng, Jin-Dong Dong, Chi-Hung Hsu, Shu-Huan Chang, Min Sun, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan
Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding.
no code implementations • ECCV 2018 • Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun
In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied.
no code implementations • ECCV 2018 • Yu-Ting Chen, Wen-Yen Chang, Hai-Lun Lu, Ting-Fan Wu, Min Sun
Recently, a few domain adaptation and active learning approaches have been proposed to mitigate the performance drop.
1 code implementation • ECCV 2018 • Po-Yu Huang, Wan-Ting Hsu, Chun-Yueh Chiu, Ting-Fan Wu, Min Sun
Uncertainty estimation in deep learning becomes more important recently.
Ranked #15 on
Semantic Segmentation
on CamVid
no code implementations • ECCV 2018 • Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun
We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e. g., inference time and memory usage) and device-agnostic (e. g., accuracy and model size) objectives.
no code implementations • CVPR 2018 • Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun
Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i. e., Cube Padding) in convolution, pooling, convolutional LSTM layers.
no code implementations • CVPR 2018 • Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun
Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i. e., Cube Padding) in convolution, pooling, convolutional LSTM layers.
1 code implementation • ACL 2018 • Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun
On the one hand, a simple extractive model can obtain sentence-level attention with high ROUGE scores but less readable.
Ranked #36 on
Abstractive Text Summarization
on CNN / Daily Mail
no code implementations • 12 Mar 2018 • Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun
Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place.
no code implementations • 2 Dec 2017 • Yong-Siang Shih, Kai-Yueh Chang, Hsuan-Tien Lin, Min Sun
In our learned space, we introduce a novel Projected Compatibility Distance (PCD) function which is differentiable and ensures diversity by aiming for at least one prototype to be close to a compatible item, whereas none of the prototypes are close to an incompatible item.
1 code implementation • 23 Nov 2017 • Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun
The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.
1 code implementation • ICCV 2017 • Tz-Ying Wu, Ting-An Chien, Cheng-Sheng Chan, Chan-Wei Hu, Min Sun
The core of the system is a novel Recurrent Neural Network (RNN) and Policy Network (PN), where the RNN encodes visual and motion observation to anticipate intention, and the PN parsimoniously triggers the process of visual observation to reduce computation requirement.
1 code implementation • 2 Oct 2017 • Yen-Chen Lin, Ming-Yu Liu, Min Sun, Jia-Bin Huang
Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model.
no code implementations • ICCV 2017 • Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles
This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.
no code implementations • CVPR 2017 • Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun
Given the main object and previously selected viewing angles, our method regresses a shift in viewing angle to move to the next one.
no code implementations • CVPR 2017 • Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun
For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.
1 code implementation • CVPR 2017 • Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun
Watching a 360{\deg} sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements.
1 code implementation • ICCV 2017 • Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun
The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain.
10 code implementations • ICCV 2017 • Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun
Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases.
no code implementations • 8 Mar 2017 • Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun
In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode.
1 code implementation • 1 Feb 2017 • Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma
Photo composition is an important factor affecting the aesthetics in photography.
no code implementations • 12 Nov 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun
Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated.
no code implementations • 25 Aug 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, Min Sun
Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.
no code implementations • 7 Dec 2015 • Cheng-Sheng Chan, Shou-Zhong Chen, Pei-Xuan Xie, Chiung-Chih Chang, Min Sun
We have collected a new synchronized HandCam and HeadCam dataset with 20 videos captured in three scenes for hand states recognition.