Semiconductor Defect Pattern Classification by Self-Proliferation-and-Attention Neural Network

1 code implementation1 Dec 2022 YuanFu Yang, Min Sun

In this paper, we present a novel architecture that can perform defect classification in a more efficient way.

MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition

1 code implementation28 Nov 2022 Fu-En Wang, Chien-Yi Wang, Min Sun, Shang-Hong Lai

In this paper, we propose MixFairFace framework to improve the fairness in face recognition models.

Face Recognition Fairness

QRF: Implicit Neural Representations with Quantum Radiance Fields

no code implementations7 Nov 2022 YuanFu Yang, Min Sun

Photorealistic rendering of real-world scenes is a tremendous challenge with a wide range of applications, including mixed reality (MR), and virtual reality (VR).

Mixed Reality

360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning

no code implementations24 Oct 2022 Bolivar Solarte, Chin-Hsuan Wu, Yueh-Cheng Liu, Yi-Hsuan Tsai, Min Sun

In addition, since ground truth annotations are not available during training nor in testing, we leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene, allowing us to evaluate any layout estimator for hyper-parameter tuning, including model selection without ground truth annotations.

Model Selection Pseudo Label

BiFuse++: Self-supervised and Efficient Bi-projection Fusion for 360 Depth Estimation

1 code implementation7 Sep 2022 Fu-En Wang, Yu-Hsuan Yeh, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

Thus, state-of-the-art frameworks for monocular 360 depth estimation such as bi-projection fusion in BiFuse are proposed.

Monocular Depth Estimation

Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning

1 code implementation CVPR 2022 YuanFu Yang, Min Sun

However, the massive expansion of semiconductor manufacturing and the development of new technology will bring many defect wafers.

Defect Detection

Autoregressive 3D Shape Generation via Canonical Mapping

1 code implementation5 Apr 2022 An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang

With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.

3D Shape Generation Point Cloud Generation +1

Data Efficient 3D Learner via Knowledge Transferred from 2D Model

1 code implementation16 Mar 2022 Ping-Chung Yu, Cheng Sun, Min Sun

In this work, we deal with the data scarcity challenge of 3D tasks by transferring knowledge from strong 2D models via RGB-D images.

Pseudo Label Semantic Segmentation

CLA-NeRF: Category-Level Articulated Neural Radiance Field

no code implementations1 Feb 2022 Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, Min Sun

We propose CLA-NeRF -- a Category-Level Articulated Neural Radiance Field that can perform view synthesis, part segmentation, and articulated pose estimation.

Pose Estimation

Meta-CPR: Generalize to Unseen Large Number of Agents with Communication Pattern Recognition Module

no code implementations14 Dec 2021 Wei-Cheng Tseng, Wei Wei, Da-Cheng Juan, Min Sun

The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios.

Meta Reinforcement Learning reinforcement-learning

360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan Estimation

1 code implementation12 Dec 2021 Bolivar Solarte, Yueh-Cheng Liu, Chin-Hsuan Wu, Yi-Hsuan Tsai, Min Sun

We present 360-DFPE, a sequential floor plan estimation method that directly takes 360-images as input without relying on active sensors or 3D information.

Visual Odometry

Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

2 code implementations CVPR 2022 Cheng Sun, Min Sun, Hwann-Tzong Chen

Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

Novel View Synthesis

Dense Prediction with Attentive Feature Aggregation

no code implementations1 Nov 2021 Yung-Hsu Yang, Thomas E. Huang, Min Sun, Samuel Rota Bulò, Peter Kontschieder, Fisher Yu

Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead.

Boundary Detection BSDS500 +1

Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation

no code implementations ICCV 2021 Chi-Wei Hsiao, Cheng Sun, Hwann-Tzong Chen, Min Sun

We present a novel pyramidal output representation to ensure parsimony with our "specialize and fuse" process for semantic segmentation.

Semantic Segmentation Unity

Learning 3D Dense Correspondence via Canonical Point Autoencoder

no code implementations NeurIPS 2021 An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu

We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.

LED2-Net: Monocular 360deg Layout Estimation via Differentiable Depth Rendering

no code implementations CVPR 2021 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.

Depth Estimation Depth Prediction +1

Robust 360-8PA: Redesigning The Normalized 8-point Algorithm for 360-FoV Images

1 code implementation22 Apr 2021 Bolivar Solarte, Chin-Hsuan Wu, Kuan-Wei Lu, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

This paper presents a novel preconditioning strategy for the classic 8-point algorithm (8-PA) for estimating an essential matrix from 360-FoV images (i. e., equirectangular images) in spherical projection.

Monocular Quasi-Dense 3D Object Tracking

1 code implementation12 Mar 2021 Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun

Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.

3D Object Tracking Association +3

Toward Robust Long Range Policy Transfer

no code implementations4 Mar 2021 Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun

We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase.

Interactive Radiotherapy Target Delineation with 3D-Fused Context Propagation

no code implementations12 Dec 2020 Chun-Hung Chao, Hsien-Tzu Cheng, Tsung-Ying Ho, Le Lu, Min Sun

The proposed method is evaluated on two published radiotherapy target contouring datasets of nasopharyngeal and esophageal cancer.

Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization

no code implementations NeurIPS 2020 Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar.

Continual Learning

HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features

1 code implementation CVPR 2021 Cheng Sun, Min Sun, Hwann-Tzong Chen

We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat).

3D Room Layouts From A Single RGB Panorama Depth Estimation +1

Controllable Image Synthesis via SegVAE

no code implementations ECCV 2020 Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.

Conditional Image Generation Image-to-Image Translation +1

LayoutMP3D: Layout Annotation of Matterport3D

1 code implementation30 Mar 2020 Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Inferring the information of 3D layout from a single equirectangular panorama is crucial for numerous applications of virtual reality or robotics (e. g., scene understanding and navigation).

Scene Understanding

Visual Question Answering on 360° Images

no code implementations10 Jan 2020 Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.

Question Answering Visual Question Answering +1

Bias-Aware Heapified Policy for Active Learning

no code implementations18 Nov 2019 Wen-Yen Chang, Wen-Huan Chiang, Shao-Hao Lu, Tingfan Wu, Min Sun

Last but not least, we investigate the generalization of the HAL policy learned on MNIST dataset by directly applying it on MNIST-M. We show that the agent can generalize and outperform directly-learned policy under constrained labeled sets.

Active Learning

360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

1 code implementation11 Nov 2019 Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

Recently, end-to-end trainable deep neural networks have significantly improved stereo depth estimation for perspective images.

Stereo Depth Estimation

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

no code implementations3 Oct 2019 Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu

In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.

object-detection Object Detection

Flat2Layout: Flat Representation for Estimating Layout of General Room Types

no code implementations29 May 2019 Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen

This paper also constructs a benchmark for validating the performance on general layout topologies, where Flat2Layout achieves good performance on general room types.

Radiotherapy Target Contouring with Convolutional Gated Graph Neural Network

no code implementations5 Apr 2019 Chun-Hung Chao, Yen-Chi Cheng, Hsien-Tzu Cheng, Chi-Wen Huang, Tsung-Ying Ho, Chen-Kan Tseng, Le Lu, Min Sun

Instead, inspired by the treating methodology of considering meaningful information across slices, we used Gated Graph Neural Network to frame this problem more efficiently.

Point-to-Point Video Generation

3 code implementations ICCV 2019 Tsun-Hsuan Wang, Yen-Chi Cheng, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun

We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames.

Image Manipulation Video Generation

3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization

1 code implementation5 Apr 2019 Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception.

Depth Completion Stereo-LiDAR Fusion +2

Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors

no code implementations25 Mar 2019 Fang-I Hsiao, Jui-Hsuan Kuo, Min Sun

The encoder infers discrete latent factors corresponding to different behaviors from demonstrations.

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation

2 code implementations20 Dec 2018 Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input.

Depth Estimation Depth Prediction

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation ICCV 2019 Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

3D Object Detection 3D Pose Estimation +5

InstaNAS: Instance-aware Neural Architecture Search

2 code implementations26 Nov 2018 An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, Min Sun

Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy.

Neural Architecture Search

Self-Supervised Learning of Depth and Camera Motion from 360° Videos

no code implementations13 Nov 2018 Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, Min Sun

We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360{\deg} video.

Depth And Camera Motion Depth Prediction +3

Unsupervised Stylish Image Description Generation via Domain Layer Norm

no code implementations11 Sep 2018 Cheng Kuan Chen, Zhu Feng Pan, Min Sun, Ming-Yu Liu

It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions.

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

no code implementations29 Aug 2018 An-Chieh Cheng, Jin-Dong Dong, Chi-Hung Hsu, Shu-Huan Chang, Min Sun, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding.

Image Classification

Liquid Pouring Monitoring via Rich Sensory Inputs

no code implementations ECCV 2018 Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun

In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied.

DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures

no code implementations ECCV 2018 Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e. g., inference time and memory usage) and device-agnostic (e. g., accuracy and model size) objectives.

Image Classification Language Modelling

Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos

no code implementations CVPR 2018 Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun

Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i. e., Cube Padding) in convolution, pooling, convolutional LSTM layers.

Saliency Prediction

Omnidirectional CNN for Visual Place Recognition and Navigation

no code implementations12 Mar 2018 Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun

Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place.

Navigate Visual Place Recognition

Compatibility Family Learning for Item Recommendation and Generation

no code implementations2 Dec 2017 Yong-Siang Shih, Kai-Yueh Chang, Hsuan-Tien Lin, Min Sun

In our learned space, we introduce a novel Projected Compatibility Distance (PCD) function which is differentiable and ensures diversity by aiming for at least one prototype to be close to a compatible item, whereas none of the prototypes are close to an incompatible item.

Self-view Grounding Given a Narrated 360° Video

1 code implementation23 Nov 2017 Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.

Visual Grounding

Anticipating Daily Intention using On-Wrist Motion Triggered Sensing

1 code implementation ICCV 2017 Tz-Ying Wu, Ting-An Chien, Cheng-Sheng Chan, Chan-Wei Hu, Min Sun

The core of the system is a novel Recurrent Neural Network (RNN) and Policy Network (PN), where the RNN encodes visual and motion observation to anticipate intention, and the PN parsimoniously triggers the process of visual observation to reduce computation requirement.

Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

1 code implementation2 Oct 2017 Yen-Chen Lin, Ming-Yu Liu, Min Sun, Jia-Bin Huang

Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model.

Autonomous Vehicles Decision Making +1

Visual Forecasting by Imitating Dynamics in Natural Sequences

no code implementations ICCV 2017 Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles

This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.

Action Anticipation

Deep 360 Pilot: Learning a Deep Agent for Piloting Through 360deg Sports Videos

no code implementations CVPR 2017 Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Given the main object and previously selected viewing angles, our method regresses a shift in viewing angle to move to the next one.

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

no code implementations CVPR 2017 Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun

For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.

Accident Anticipation

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

1 code implementation CVPR 2017 Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Watching a 360{\deg} sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements.

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

9 code implementations ICCV 2017 Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun

Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases.

Semantic Segmentation

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

no code implementations8 Mar 2017 Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun

In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode.

Adversarial Attack Atari Games +1

Learning to Compose with Professional Photographs on the Web

1 code implementation1 Feb 2017 Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma

Photo composition is an important factor affecting the aesthetics in photography.

Image Cropping

Title Generation for User Generated Videos

no code implementations25 Aug 2016 Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, Min Sun

Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.

Video Captioning

Recognition from Hand Cameras

no code implementations7 Dec 2015 Cheng-Sheng Chan, Shou-Zhong Chen, Pei-Xuan Xie, Chiung-Chih Chang, Min Sun

We have collected a new synchronized HandCam and HeadCam dataset with 20 videos captured in three scenes for hand states recognition.

