Search Results for author: Min Sun

Found 86 papers, 36 papers with code

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

no code implementations • 15 Apr 2024 • Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng, Wei Wang, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo, Ming-Hsuan Yang

A unique property of our Bi-Layout model is its ability to inherently detect ambiguous regions by comparing the two predictions.

Room Layout Estimation

Paper
Add Code

PoCo: Point Context Cluster for RGBD Indoor Place Recognition

no code implementations • 3 Apr 2024 • Jing Liang, Zhuo Deng, Zheming Zhou, Omid Ghasemalizadeh, Dinesh Manocha, Min Sun, Cheng-Hao Kuo, Arnie Sen

We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database.

Paper
Add Code

GDA: Generalized Diffusion for Robust Test-time Adaptation

no code implementations • 29 Mar 2024 • Yun-Yun Tsai, Fu-Chen Chen, Albert Y. C. Chen, Junfeng Yang, Che-Chun Su, Min Sun, Cheng-Hao Kuo

For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the model's weights.

Test-time Adaptation

Paper
Add Code

Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

no code implementations • 6 Mar 2024 • Zewei Tian, Min Sun, Alex Liu, Shawon Sarkar, Jing Liu

This paper explores the transformative potential of computer-assisted textual analysis in enhancing instructional quality through in-depth insights from educational artifacts.

Paper
Add Code

iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views

1 code implementation • 28 Dec 2023 • Chin-Hsuan Wu, Yen-Chun Chen, Bolivar Solarte, Lu Yuan, Min Sun

Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views.

3D Object Reconstruction Novel View Synthesis +2

Paper
Code

DreaMo: Articulated 3D Reconstruction From A Single Casual Video

no code implementations • 5 Dec 2023 • Tao Tu, Ming-Feng Li, Chieh Hubert Lin, Yen-Chi Cheng, Min Sun, Ming-Hsuan Yang

In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete.

3D Reconstruction 3D Shape Reconstruction

Paper
Add Code

From Voices to Validity: Leveraging Large Language Models (LLMs) for Textual Analysis of Policy Stakeholder Interviews

no code implementations • 2 Dec 2023 • Alex Liu, Min Sun

Obtaining stakeholders' diverse experiences and opinions about current policy in a timely manner is crucial for policymakers to identify strengths and gaps in resource allocation, thereby supporting effective policy design and implementation.

Sentiment Analysis

Paper
Add Code

Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior

no code implementations • 15 Oct 2023 • Xiaotong Chen, Zheming Zhou, Zhuo Deng, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnie Sen

Reconstructing transparent objects using affordable RGB-D cameras is a persistent challenge in robotic perception due to inconsistent appearances across views in the RGB domain and inaccurate depth readings in each single-view.

3D Reconstruction Depth Completion +3

Paper
Add Code

PanoMixSwap Panorama Mixing via Structural Swapping for Indoor Scene Understanding

no code implementations • 18 Sep 2023 • Yu-Cheng Hsieh, Cheng Sun, Suraj Dengale, Min Sun

The volume and diversity of training data are critical for modern deep learningbased methods.

Data Augmentation Scene Understanding +1

Paper
Add Code

Sparse and Privacy-enhanced Representation for Human Pose Estimation

no code implementations • 18 Sep 2023 • Ting-Ying Lin, Lin-Yung Hsieh, Fu-En Wang, Wen-Shen Wuen, Min Sun

We propose a sparse and privacy-enhanced representation for Human Pose Estimation (HPE).

Face Recognition Pose Estimation

Paper
Add Code

ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection

no code implementations • ICCV 2023 • Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy, Cheng-Hao Kuo, Min Sun

The results demonstrate that ImGeoNet outperforms the current state-of-the-art multi-view image-based method, ImVoxelNet, on all three datasets in terms of detection accuracy.

Ranked #24 on 3D Object Detection on ScanNetV2

3D Object Detection object-detection

Paper
Add Code

ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

1 code implementation • 4 Aug 2023 • Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia

Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e. g. achieving 76. 3% top-1 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data.

Image Classification Language Modelling +2

Paper
Code

Medium. Permeation: SARS-COV-2 Painting Creation by Generative Model

no code implementations • 22 Apr 2023 • Yuan-Fu Yang, Iuan-Kai Fang, Min Sun, Su-Chu Hsu

We find similarities of color structure and color stacking in the Impressionist paintings and the illustrations of the novel coronavirus by artists around the world.

Generative Adversarial Network

Paper
Add Code

The Study of Highway for Lifelong Multi-Agent Path Finding

no code implementations • 9 Apr 2023 • Ming-Feng Li, Min Sun

However, existing methods encounter exponential growth of runtime and undesirable phenomena of deadlocks and rerouting as the map size or agent density grows.

Multi-Agent Path Finding

Paper
Add Code

VMCML: Video and Music Matching via Cross-Modality Lifting

no code implementations • 22 Mar 2023 • Yi-Shan Lee, Wei-Cheng Tseng, Fu-En Wang, Min Sun

We propose a content-based system for matching video and background music.

Music Recommendation

Paper
Add Code

Bidirectional Alignment for Domain Adaptive Detection with Transformers

1 code implementation • ICCV 2023 • Liqiang He, Wei Wang, Albert Chen, Min Sun, Cheng-Hao Kuo, Sinisa Todorovic

We propose a Bidirectional Alignment for domain adaptive Detection with Transformers (BiADT) to improve cross domain object detection performance.

Object object-detection +1

Paper
Code

CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

no code implementations • 2 Dec 2022 • Tobias Fischer, Yung-Hsu Yang, Suryansh Kumar, Min Sun, Fisher Yu

To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings.

3D Object Tracking Autonomous Vehicles +2

Paper
Add Code

Semiconductor Defect Pattern Classification by Self-Proliferation-and-Attention Neural Network

1 code implementation • 1 Dec 2022 • YuanFu Yang, Min Sun

In this paper, we present a novel architecture that can perform defect classification in a more efficient way.

Paper
Code

MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition

1 code implementation • 28 Nov 2022 • Fu-En Wang, Chien-Yi Wang, Min Sun, Shang-Hong Lai

In this paper, we propose MixFairFace framework to improve the fairness in face recognition models.

Attribute Face Recognition +1

Paper
Code

A Quantum-Powered Photorealistic Rendering

no code implementations • 7 Nov 2022 • YuanFu Yang, Min Sun

Achieving photorealistic rendering of real-world scenes poses a significant challenge with diverse applications, including mixed reality and virtual reality.

Mixed Reality Numerical Integration

Paper
Add Code

360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning

1 code implementation • 24 Oct 2022 • Bolivar Solarte, Chin-Hsuan Wu, Yueh-Cheng Liu, Yi-Hsuan Tsai, Min Sun

In addition, since ground truth annotations are not available during training nor in testing, we leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene, allowing us to evaluate any layout estimator for hyper-parameter tuning, including model selection without ground truth annotations.

Model Selection Pseudo Label

Paper
Code

BiFuse++: Self-supervised and Efficient Bi-projection Fusion for 360 Depth Estimation

1 code implementation • 7 Sep 2022 • Fu-En Wang, Yu-Hsuan Yeh, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

Thus, state-of-the-art frameworks for monocular 360 depth estimation such as bi-projection fusion in BiFuse are proposed.

Ranked #12 on Depth Estimation on Stanford2D3D Panoramic

Monocular Depth Estimation

Paper
Code

Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning

1 code implementation • CVPR 2022 • YuanFu Yang, Min Sun

However, the massive expansion of semiconductor manufacturing and the development of new technology will bring many defect wafers.

Defect Detection

Paper
Code

Autoregressive 3D Shape Generation via Canonical Mapping

1 code implementation • 5 Apr 2022 • An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang

With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.

3D Shape Generation Point Cloud Generation +1

Paper
Code

Data Efficient 3D Learner via Knowledge Transferred from 2D Model

1 code implementation • 16 Mar 2022 • Ping-Chung Yu, Cheng Sun, Min Sun

In this work, we deal with the data scarcity challenge of 3D tasks by transferring knowledge from strong 2D models via RGB-D images.

Pseudo Label Segmentation +1

Paper
Code

CLA-NeRF: Category-Level Articulated Neural Radiance Field

no code implementations • 1 Feb 2022 • Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, Min Sun

We propose CLA-NeRF -- a Category-Level Articulated Neural Radiance Field that can perform view synthesis, part segmentation, and articulated pose estimation.

Inverse Rendering Object +1

Paper
Add Code

Meta-CPR: Generalize to Unseen Large Number of Agents with Communication Pattern Recognition Module

no code implementations • 14 Dec 2021 • Wei-Cheng Tseng, Wei Wei, Da-Cheng Juan, Min Sun

The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Add Code

360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan Estimation

1 code implementation • 12 Dec 2021 • Bolivar Solarte, Yueh-Cheng Liu, Chin-Hsuan Wu, Yi-Hsuan Tsai, Min Sun

We present 360-DFPE, a sequential floor plan estimation method that directly takes 360-images as input without relying on active sensors or 3D information.

Visual Odometry

Paper
Code

Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction

no code implementations • 1 Dec 2021 • Wei-Cheng Tseng, Po-Han Chi, Jia-Hua Wu, Min Sun

In contrast, most of the existing methods delete the rare protein functions to reduce the label space.

Protein Function Prediction

Paper
Add Code

Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

2 code implementations • CVPR 2022 • Cheng Sun, Min Sun, Hwann-Tzong Chen

Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

Novel View Synthesis

1,243

Paper
Code

Dense Prediction with Attentive Feature Aggregation

no code implementations • 1 Nov 2021 • Yung-Hsu Yang, Thomas E. Huang, Min Sun, Samuel Rota Bulò, Peter Kontschieder, Fisher Yu

Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead.

Boundary Detection Semantic Segmentation

Paper
Add Code

Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation

no code implementations • ICCV 2021 • Chi-Wei Hsiao, Cheng Sun, Hwann-Tzong Chen, Min Sun

We present a novel pyramidal output representation to ensure parsimony with our "specialize and fuse" process for semantic segmentation.

Segmentation Semantic Segmentation +1

Paper
Add Code

Learning 3D Dense Correspondence via Canonical Point Autoencoder

no code implementations • NeurIPS 2021 • An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu

We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.

Segmentation

Paper
Add Code

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

1 code implementation • CVPR 2021 • Cheng Sun, Chi-Wei Hsiao, Ning-Hsu Wang, Min Sun, Hwann-Tzong Chen

Indoor panorama typically consists of human-made structures parallel or perpendicular to gravity.

3D Reconstruction Instance Segmentation +2

Paper
Code

LED2-Net: Monocular 360deg Layout Estimation via Differentiable Depth Rendering

no code implementations • CVPR 2021 • Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.

Depth Estimation Depth Prediction +1

Paper
Add Code

Robust 360-8PA: Redesigning The Normalized 8-point Algorithm for 360-FoV Images

1 code implementation • 22 Apr 2021 • Bolivar Solarte, Chin-Hsuan Wu, Kuan-Wei Lu, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

This paper presents a novel preconditioning strategy for the classic 8-point algorithm (8-PA) for estimating an essential matrix from 360-FoV images (i. e., equirectangular images) in spherical projection.

Paper
Code

LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering

1 code implementation • 1 Apr 2021 • Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Although significant progress has been made in room layout estimation, most methods aim to reduce the loss in the 2D pixel coordinate rather than exploiting the room structure in the 3D space.

Ranked #3 on 3D Room Layouts From A Single RGB Panorama on Stanford2D3D Panoramic

3D Room Layouts From A Single RGB Panorama Depth Estimation +2

104

Paper
Code

Monocular Quasi-Dense 3D Object Tracking

1 code implementation • 12 Mar 2021 • Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun

Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.

Ranked #7 on Multiple Object Tracking on KITTI Tracking test

3D Object Tracking Autonomous Driving +3

504

Paper
Code

Toward Robust Long Range Policy Transfer

no code implementations • 4 Mar 2021 • Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun

We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase.

Paper
Add Code

Interactive Radiotherapy Target Delineation with 3D-Fused Context Propagation

no code implementations • 12 Dec 2020 • Chun-Hung Chao, Hsien-Tzu Cheng, Tsung-Ying Ho, Le Lu, Min Sun

The proposed method is evaluated on two published radiotherapy target contouring datasets of nasopharyngeal and esophageal cancer.

Paper
Add Code

Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization

no code implementations • NeurIPS 2020 • Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar.

Continual Learning

Paper
Add Code

HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features

1 code implementation • CVPR 2021 • Cheng Sun, Min Sun, Hwann-Tzong Chen

We present HoHoNet, a versatile and efficient framework for holistic understanding of an indoor 360-degree panorama using a Latent Horizontal Feature (LHFeat).

Ranked #3 on Semantic Segmentation on Stanford2D3D Panoramic - RGBD

3D Room Layouts From A Single RGB Panorama Depth Estimation +1

100

Paper
Code

Lymph Node Gross Tumor Volume Detection in Oncology Imaging via Relationship Learning Using Graph Neural Network

no code implementations • 29 Aug 2020 • Chun-Hung Chao, Zhuotun Zhu, Dazhou Guo, Ke Yan, Tsung-Ying Ho, Jinzheng Cai, Adam P. Harrison, Xianghua Ye, Jing Xiao, Alan Yuille, Min Sun, Le Lu, Dakai Jin

Specifically, we first utilize a 3D convolutional neural network with ROI-pooling to extract the GTV$_{LN}$'s instance-wise appearance features.

Clinical Knowledge

Paper
Add Code

Controllable Image Synthesis via SegVAE

no code implementations • ECCV 2020 • Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.

Conditional Image Generation Image-to-Image Translation +2

Paper
Add Code

LayoutMP3D: Layout Annotation of Matterport3D

1 code implementation • 30 Mar 2020 • Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai

Inferring the information of 3D layout from a single equirectangular panorama is crucial for numerous applications of virtual reality or robotics (e. g., scene understanding and navigation).

Scene Understanding

Paper
Code

Visual Question Answering on 360° Images

no code implementations • 10 Jan 2020 • Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.

Question Answering Visual Question Answering

Paper
Add Code

Bias-Aware Heapified Policy for Active Learning

no code implementations • 18 Nov 2019 • Wen-Yen Chang, Wen-Huan Chiang, Shao-Hao Lu, Tingfan Wu, Min Sun

Last but not least, we investigate the generalization of the HAL policy learned on MNIST dataset by directly applying it on MNIST-M. We show that the agent can generalize and outperform directly-learned policy under constrained labeled sets.

Active Learning

Paper
Add Code

360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

1 code implementation • 11 Nov 2019 • Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

Recently, end-to-end trainable deep neural networks have significantly improved stereo depth estimation for perspective images.

Stereo Depth Estimation

161

Paper
Code

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

no code implementations • 3 Oct 2019 • Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu

In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.

Object object-detection +1

Paper
Add Code

Flat2Layout: Flat Representation for Estimating Layout of General Room Types

no code implementations • 29 May 2019 • Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen

This paper also constructs a benchmark for validating the performance on general layout topologies, where Flat2Layout achieves good performance on general room types.

Paper
Add Code

Point-to-Point Video Generation

3 code implementations • ICCV 2019 • Tsun-Hsuan Wang, Yen-Chi Cheng, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun

We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames.

Image Manipulation Video Editing +1

Paper
Code

Radiotherapy Target Contouring with Convolutional Gated Graph Neural Network

no code implementations • 5 Apr 2019 • Chun-Hung Chao, Yen-Chi Cheng, Hsien-Tzu Cheng, Chi-Wen Huang, Tsung-Ying Ho, Chen-Kan Tseng, Le Lu, Min Sun

Instead, inspired by the treating methodology of considering meaningful information across slices, we used Gated Graph Neural Network to frame this problem more efficiently.

Paper
Add Code

3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization

1 code implementation • 5 Apr 2019 • Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception.

Ranked #3 on Stereo-LiDAR Fusion on KITTI Depth Completion Validation

Depth Completion Stereo-LiDAR Fusion +2

Paper
Code

Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors

no code implementations • 25 Mar 2019 • Fang-I Hsiao, Jui-Hsuan Kuo, Min Sun

The encoder infers discrete latent factors corresponding to different behaviors from demonstrations.

Paper
Add Code

HorizonNet: Learning Room Layout with 1D Representation and Pano Stretch Data Augmentation

1 code implementation • CVPR 2019 • Cheng Sun, Chi-Wei Hsiao, Min Sun, Hwann-Tzong Chen

We present a new approach to the problem of estimating the 3D room layout from a single panoramic image.

Ranked #4 on 3D Room Layouts From A Single RGB Panorama on PanoContext

3D Room Layouts From A Single RGB Panorama Data Augmentation

309

Paper
Code

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation

2 code implementations • 20 Dec 2018 • Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input.

Depth Estimation Depth Prediction

Paper
Code

DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama

1 code implementation • CVPR 2019 • Shang-Ta Yang, Fu-En Wang, Chi-Han Peng, Peter Wonka, Min Sun, Hung-Kuo Chu

We present a deep learning framework, called DuLa-Net, to predict Manhattan-world 3D room layouts from a single RGB panorama.

Ranked #1 on 3D Room Layouts From A Single RGB Panorama on Realtor360

3D Room Layouts From A Single RGB Panorama

101

Paper
Code

InstaNAS: Instance-aware Neural Architecture Search

2 code implementations • 26 Nov 2018 • An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, Min Sun

Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy.

Neural Architecture Search

Paper
Code

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

Ranked #12 on Multiple Object Tracking on KITTI Tracking test

3D Object Detection 3D Pose Estimation +4

652

Paper
Code

Self-Supervised Learning of Depth and Camera Motion from 360° Videos

no code implementations • 13 Nov 2018 • Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, Min Sun

We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360{\deg} video.

Depth And Camera Motion Depth Prediction +3

Paper
Add Code

Unsupervised Stylish Image Description Generation via Domain Layer Norm

no code implementations • 11 Sep 2018 • Cheng Kuan Chen, Zhu Feng Pan, Min Sun, Ming-Yu Liu

It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions.

Paper
Add Code

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

no code implementations • 29 Aug 2018 • An-Chieh Cheng, Jin-Dong Dong, Chi-Hung Hsu, Shu-Huan Chang, Min Sun, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding.

Image Classification

Paper
Add Code

Liquid Pouring Monitoring via Rich Sensory Inputs

no code implementations • ECCV 2018 • Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun

In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied.

Paper
Add Code

Leveraging Motion Priors in Videos for Improving Human Segmentation

no code implementations • ECCV 2018 • Yu-Ting Chen, Wen-Yen Chang, Hai-Lun Lu, Ting-Fan Wu, Min Sun

Recently, a few domain adaptation and active learning approaches have been proposed to mitigate the performance drop.

Active Learning Domain Adaptation +3

Paper
Add Code

Efficient Uncertainty Estimation for Semantic Segmentation in Videos

1 code implementation • ECCV 2018 • Po-Yu Huang, Wan-Ting Hsu, Chun-Yueh Chiu, Ting-Fan Wu, Min Sun

Uncertainty estimation in deep learning becomes more important recently.

Ranked #16 on Semantic Segmentation on CamVid

Semantic Segmentation

Paper
Code

DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures

no code implementations • ECCV 2018 • Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e. g., inference time and memory usage) and device-agnostic (e. g., accuracy and model size) objectives.

Image Classification Language Modelling

Paper
Add Code

Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos

no code implementations • CVPR 2018 • Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun

Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i. e., Cube Padding) in convolution, pooling, convolutional LSTM layers.

Saliency Prediction

Paper
Add Code

Cube Padding for Weakly-Supervised Saliency Prediction in 360Â° Videos

no code implementations • CVPR 2018 • Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun

Then, we concatenate all six faces while utilizing the connectivity between faces on the cube for image padding (i. e., Cube Padding) in convolution, pooling, convolutional LSTM layers.

Saliency Prediction

Paper
Add Code

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

1 code implementation • ACL 2018 • Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun

On the one hand, a simple extractive model can obtain sentence-level attention with high ROUGE scores but less readable.

Ranked #39 on Abstractive Text Summarization on CNN / Daily Mail

Abstractive Text Summarization Sentence

125

Paper
Code

Omnidirectional CNN for Visual Place Recognition and Navigation

no code implementations • 12 Mar 2018 • Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun

Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place.

Navigate Visual Place Recognition

Paper
Add Code

Compatibility Family Learning for Item Recommendation and Generation

1 code implementation • 2 Dec 2017 • Yong-Siang Shih, Kai-Yueh Chang, Hsuan-Tien Lin, Min Sun

In our learned space, we introduce a novel Projected Compatibility Distance (PCD) function which is differentiable and ensures diversity by aiming for at least one prototype to be close to a compatible item, whereas none of the prototypes are close to an incompatible item.

Generative Adversarial Network

Paper
Code

Self-view Grounding Given a Narrated 360° Video

1 code implementation • 23 Nov 2017 • Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.

Sentence Visual Grounding

Paper
Code

Anticipating Daily Intention using On-Wrist Motion Triggered Sensing

1 code implementation • ICCV 2017 • Tz-Ying Wu, Ting-An Chien, Cheng-Sheng Chan, Chan-Wei Hu, Min Sun

The core of the system is a novel Recurrent Neural Network (RNN) and Policy Network (PN), where the RNN encodes visual and motion observation to anticipate intention, and the PN parsimoniously triggers the process of visual observation to reduce computation requirement.

Paper
Code

Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

2 code implementations • 2 Oct 2017 • Yen-Chen Lin, Ming-Yu Liu, Min Sun, Jia-Bin Huang

Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model.

Autonomous Vehicles Decision Making +2

Paper
Code

Visual Forecasting by Imitating Dynamics in Natural Sequences

no code implementations • ICCV 2017 • Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles

This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.

Action Anticipation

Paper
Add Code

Deep 360 Pilot: Learning a Deep Agent for Piloting Through 360deg Sports Videos

no code implementations • CVPR 2017 • Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Given the main object and previously selected viewing angles, our method regresses a shift in viewing angle to move to the next one.

Object

Paper
Add Code

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

no code implementations • CVPR 2017 • Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun

For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.

Accident Anticipation

Paper
Add Code

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

1 code implementation • CVPR 2017 • Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Watching a 360{\deg} sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements.

Paper
Code

Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

1 code implementation • ICCV 2017 • Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun

The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain.

Image Captioning Sentence +1

148

Paper
Code

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

9 code implementations • ICCV 2017 • Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun

Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases.

Segmentation Semantic Segmentation

837

Paper
Code

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

no code implementations • 8 Mar 2017 • Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun

In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode.

Adversarial Attack Atari Games +2

Paper
Add Code

Learning to Compose with Professional Photographs on the Web

1 code implementation • 1 Feb 2017 • Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, Kwan-Liu Ma

Photo composition is an important factor affecting the aesthetics in photography.

Image Cropping

Paper
Code

Leveraging Video Descriptions to Learn Video Question Answering

no code implementations • 12 Nov 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun

Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated.

Question Answering Video Question Answering +1

Paper
Add Code

Title Generation for User Generated Videos

no code implementations • 25 Aug 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, Min Sun

Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.

Sentence Video Captioning

Paper
Add Code

Recognition from Hand Cameras

no code implementations • 7 Dec 2015 • Cheng-Sheng Chan, Shou-Zhong Chen, Pei-Xuan Xie, Chiung-Chih Chang, Min Sun

We have collected a new synchronized HandCam and HeadCam dataset with 20 videos captured in three scenes for hand states recognition.

Paper
Add Code

Learning Hierarchical Linguistic Descriptions of Visual Datasets

no code implementations • WS 2013 • Roni Mittelman, Min Sun, Benjamin Kuipers, Silvio Savarese

Image Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.