Search Results for author: Min Sun

Found 56 papers, 23 papers with code

Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

1 code implementation22 Nov 2021 Cheng Sun, Min Sun, Hwann-Tzong Chen

Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

Novel View Synthesis

Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation

no code implementations ICCV 2021 Chi-Wei Hsiao, Cheng Sun, Hwann-Tzong Chen, Min Sun

We present a novel pyramidal output representation to ensure parsimony with our "specialize and fuse" process for semantic segmentation.

Semantic Segmentation Unity

Learning 3D Dense Correspondence via Canonical Point Autoencoder

no code implementations NeurIPS 2021 An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu

We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.

LED2-Net: Monocular 360deg Layout Estimation via Differentiable Depth Rendering

no code implementations CVPR 2021 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.

Depth Estimation Room Layout Estimation

Robust 360-8PA: Redesigning The Normalized 8-point Algorithm for 360-FoV Images

1 code implementation22 Apr 2021 Bolivar Solarte, Chin-Hsuan Wu, Kuan-Wei Lu, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

This paper presents a novel preconditioning strategy for the classic 8-point algorithm (8-PA) for estimating an essential matrix from 360-FoV images (i. e., equirectangular images) in spherical projection.

LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering

1 code implementation1 Apr 2021 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.

Depth Estimation Room Layout Estimation

Monocular Quasi-Dense 3D Object Tracking

1 code implementation12 Mar 2021 Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun

Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.

3D Object Tracking Autonomous Driving +2

Toward Robust Long Range Policy Transfer

no code implementations4 Mar 2021 Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun

We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase.

Hierarchical structure

Interactive Radiotherapy Target Delineation with 3D-Fused Context Propagation

no code implementations12 Dec 2020 Chun-Hung Chao, Hsien-Tzu Cheng, Tsung-Ying Ho, Le Lu, Min Sun

The proposed method is evaluated on two published radiotherapy target contouring datasets of nasopharyngeal and esophageal cancer.

Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization

no code implementations NeurIPS 2020 Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar.

Continual Learning Fine-tuning

HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features

1 code implementation CVPR 2021 Cheng Sun, Min Sun, Hwann-Tzong Chen

We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat).

3D Room Layouts From A Single RGB Panorama Depth Estimation +1

Lymph Node Gross Tumor Volume Detection in Oncology Imaging via Relationship Learning Using Graph Neural Network

no code implementations29 Aug 2020 Chun-Hung Chao, Zhuotun Zhu, Dazhou Guo, Ke Yan, Tsung-Ying Ho, Jinzheng Cai, Adam P. Harrison, Xianghua Ye, Jing Xiao, Alan Yuille, Min Sun, Le Lu, Dakai Jin

Specifically, we first utilize a 3D convolutional neural network with ROI-pooling to extract the GTV$_{LN}$'s instance-wise appearance features.

Controllable Image Synthesis via SegVAE

no code implementations ECCV 2020 Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.

Conditional Image Generation Image-to-Image Translation +1

LayoutMP3D: Layout Annotation of Matterport3D

1 code implementation30 Mar 2020 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Inferring the information of 3D layout from a single equirectangular panorama is crucial for numerous applications of virtual reality or robotics (e. g., scene understanding and navigation).

Scene Understanding Virtual Reality

Visual Question Answering on 360° Images

no code implementations10 Jan 2020 Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.

Question Answering Visual Question Answering

Bias-Aware Heapified Policy for Active Learning

no code implementations18 Nov 2019 Wen-Yen Chang, Wen-Huan Chiang, Shao-Hao Lu, Tingfan Wu, Min Sun

Last but not least, we investigate the generalization of the HAL policy learned on MNIST dataset by directly applying it on MNIST-M. We show that the agent can generalize and outperform directly-learned policy under constrained labeled sets.

Active Learning

360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

1 code implementation11 Nov 2019 Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

Recently, end-to-end trainable deep neural networks have significantly improved stereo depth estimation for perspective images.

Stereo Depth Estimation

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

no code implementations3 Oct 2019 Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu

In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.

Object Detection

Flat2Layout: Flat Representation for Estimating Layout of General Room Types

no code implementations29 May 2019 Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen

This paper also constructs a benchmark for validating the performance on general layout topologies, where Flat2Layout achieves good performance on general room types.

Point-to-Point Video Generation

3 code implementations ICCV 2019 Tsun-Hsuan Wang, Yen-Chi Cheng, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun

We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames.

Image Manipulation Video Editing +1

Radiotherapy Target Contouring with Convolutional Gated Graph Neural Network

no code implementations5 Apr 2019 Chun-Hung Chao, Yen-Chi Cheng, Hsien-Tzu Cheng, Chi-Wen Huang, Tsung-Ying Ho, Chen-Kan Tseng, Le Lu, Min Sun

Instead, inspired by the treating methodology of considering meaningful information across slices, we used Gated Graph Neural Network to frame this problem more efficiently.

3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization

1 code implementation5 Apr 2019 Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception.

Depth Completion Stereo-LiDAR Fusion +2

Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors

no code implementations25 Mar 2019 Fang-I Hsiao, Jui-Hsuan Kuo, Min Sun

The encoder infers discrete latent factors corresponding to different behaviors from demonstrations.

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation

2 code implementations20 Dec 2018 Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input.

Depth Estimation

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation ICCV 2019 Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

3D Object Detection 3D Pose Estimation +4

InstaNAS: Instance-aware Neural Architecture Search

2 code implementations26 Nov 2018 An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, Min Sun

Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy.

Neural Architecture Search

Self-Supervised Learning of Depth and Camera Motion from 360° Videos

no code implementations13 Nov 2018 Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, Min Sun

We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360{\deg} video.

Depth And Camera Motion Motion Estimation +2

Unsupervised Stylish Image Description Generation via Domain Layer Norm

no code implementations11 Sep 2018 Cheng Kuan Chen, Zhu Feng Pan, Min Sun, Ming-Yu Liu

It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions.

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

no code implementations29 Aug 2018 An-Chieh Cheng, Jin-Dong Dong, Chi-Hung Hsu, Shu-Huan Chang, Min Sun, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding.

Image Classification Language understanding

Liquid Pouring Monitoring via Rich Sensory Inputs

no code implementations ECCV 2018 Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun

In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied.

DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures

no code implementations ECCV 2018 Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e. g., inference time and memory usage) and device-agnostic (e. g., accuracy and model size) objectives.

Image Classification Language Modelling

Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos

no code implementations CVPR 2018 Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun

Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i. e., Cube Padding) in convolution, pooling, convolutional LSTM layers.

Saliency Prediction

Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos

no code implementations CVPR 2018 Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun

Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i. e., Cube Padding) in convolution, pooling, convolutional LSTM layers.

Saliency Prediction

Omnidirectional CNN for Visual Place Recognition and Navigation

no code implementations12 Mar 2018 Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun

Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place.

Visual Place Recognition

Compatibility Family Learning for Item Recommendation and Generation

no code implementations2 Dec 2017 Yong-Siang Shih, Kai-Yueh Chang, Hsuan-Tien Lin, Min Sun

In our learned space, we introduce a novel Projected Compatibility Distance (PCD) function which is differentiable and ensures diversity by aiming for at least one prototype to be close to a compatible item, whereas none of the prototypes are close to an incompatible item.

Self-view Grounding Given a Narrated 360° Video

1 code implementation23 Nov 2017 Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.

Visual Grounding

Anticipating Daily Intention using On-Wrist Motion Triggered Sensing

1 code implementation ICCV 2017 Tz-Ying Wu, Ting-An Chien, Cheng-Sheng Chan, Chan-Wei Hu, Min Sun

The core of the system is a novel Recurrent Neural Network (RNN) and Policy Network (PN), where the RNN encodes visual and motion observation to anticipate intention, and the PN parsimoniously triggers the process of visual observation to reduce computation requirement.

Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

1 code implementation2 Oct 2017 Yen-Chen Lin, Ming-Yu Liu, Min Sun, Jia-Bin Huang

Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model.

Autonomous Vehicles Decision Making

Visual Forecasting by Imitating Dynamics in Natural Sequences

no code implementations ICCV 2017 Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles

This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.

Action Anticipation

Deep 360 Pilot: Learning a Deep Agent for Piloting Through 360deg Sports Videos

no code implementations CVPR 2017 Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Given the main object and previously selected viewing angles, our method regresses a shift in viewing angle to move to the next one.

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

no code implementations CVPR 2017 Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun

For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.

Accident Anticipation

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

1 code implementation CVPR 2017 Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Watching a 360{\deg} sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements.

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

9 code implementations ICCV 2017 Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun

Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases.

Semantic Segmentation

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

no code implementations8 Mar 2017 Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun

In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode.

Adversarial Attack Atari Games

Learning to Compose with Professional Photographs on the Web

1 code implementation1 Feb 2017 Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma

Photo composition is an important factor affecting the aesthetics in photography.

Image Cropping

Title Generation for User Generated Videos

no code implementations25 Aug 2016 Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, Min Sun

Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.

Video Captioning

Recognition from Hand Cameras

no code implementations7 Dec 2015 Cheng-Sheng Chan, Shou-Zhong Chen, Pei-Xuan Xie, Chiung-Chih Chang, Min Sun

We have collected a new synchronized HandCam and HeadCam dataset with 20 videos captured in three scenes for hand states recognition.

Cannot find the paper you are looking for? You can Submit a new open access paper.