Search Results for author: Yuchao Dai

Found 119 papers, 39 papers with code

Forward Flow for Novel View Synthesis of Dynamic Scenes

no code implementations ICCV 2023 Xiang Guo, Jiadai Sun, Yuchao Dai, GuanYing Chen, Xiaoqing Ye, Xiao Tan, Errui Ding, Yumeng Zhang, Jingdong Wang

This paper proposes a neural radiance field (NeRF) approach for novel view synthesis of dynamic scenes using forward warping.

Novel View Synthesis

Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion

no code implementations5 Sep 2023 YuFei Wang, Yuxin Mao, Qi Liu, Yuchao Dai

The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels.

Depth Completion object-detection +2

Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion Modeling

no code implementations18 Aug 2023 Haorui Ji, Hui Deng, Yuchao Dai, Hongdong Li

Most of the previous 3D human pose estimation work relied on the powerful memory capability of the network to obtain suitable 2D-3D mappings from the training data.

3D Human Pose Estimation 3D Pose Estimation

Improving Audio-Visual Segmentation with Bidirectional Generation

no code implementations16 Aug 2023 Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong

In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework.

Motion Estimation Optical Flow Estimation +1

Digging into Depth Priors for Outdoor Neural Radiance Fields

no code implementations8 Aug 2023 Chen Wang, Jiadai Sun, Lina Liu, Chenming Wu, Zhelun Shen, Dayan Wu, Yuchao Dai, Liangjun Zhang

However, the shape-radiance ambiguity of radiance fields remains a challenge, especially in the sparse viewpoints setting.

Novel View Synthesis

Digging Into Uncertainty-based Pseudo-label for Robust Stereo Matching

1 code implementation31 Jul 2023 Zhelun Shen, Xibin Song, Yuchao Dai, Dingfu Zhou, Zhibo Rao, Liangjun Zhang

Due to the domain differences and unbalanced disparity distribution across multiple datasets, current stereo matching approaches are commonly limited to a specific dataset and generalize poorly to others.

Monocular Depth Estimation Pseudo Label +1

Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

no code implementations31 Jul 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio.

Contrastive Learning Denoising +1

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

no code implementations19 Jul 2023 Mochu Xiang, Jing Zhang, Nick Barnes, Yuchao Dai

Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models.

Monocular Depth Estimation

Linearized Relative Positional Encoding

no code implementations18 Jul 2023 Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Image Classification Language Modelling +2

Weakly-supervised Contrastive Learning for Unsupervised Object Discovery

1 code implementation7 Jul 2023 Yunqiu Lv, Jing Zhang, Nick Barnes, Yuchao Dai

Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation.

Contrastive Learning Image Reconstruction +3

Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

1 code implementation6 Jun 2023 Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai

In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection.

object-detection Representation Learning +2

Toeplitz Neural Network for Sequence Modeling

1 code implementation8 May 2023 Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Sequence modeling has important applications in natural language processing and computer vision.

Language Modelling

A Revisit to the Normalized Eight-Point Algorithm and A Self-Supervised Deep Solution

no code implementations21 Apr 2023 Bin Fan, Yuchao Dai, Yongduek Seo, Mingyi He

The Normalized Eight-Point algorithm has been widely viewed as the cornerstone in two-view geometry computation, where the seminal Hartley's normalization greatly improves the performance of the direct linear transformation (DLT) algorithm.

Self-Supervised Learning

Event-guided Multi-patch Network with Self-supervision for Non-uniform Motion Deblurring

1 code implementation14 Feb 2023 Hongguang Zhang, Limeng Zhang, Yuchao Dai, Hongdong Li, Piotr Koniusz

Contemporary deep learning multi-scale deblurring models suffer from many issues: 1) They perform poorly on non-uniformly blurred images/videos; 2) Simply increasing the model depth with finer-scale levels cannot improve deblurring; 3) Individual RGB frames contain a limited motion information for deblurring; 4) Previous models have a limited robustness to spatial transformations and noise.


Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction

1 code implementation CVPR 2023 Bin Fan, Yuxin Mao, Yuchao Dai, Zhexiong Wan, Qi Liu

Rolling shutter correction (RSC) is becoming increasingly popular for RS cameras that are widely used in commercial and industrial applications.

Data Augmentation Rolling Shutter Correction

Modeling the Distributional Uncertainty for Salient Object Detection Models

no code implementations CVPR 2023 Xinyu Tian, Jing Zhang, Mochu Xiang, Yuchao Dai

Most of the existing salient object detection (SOD) models focus on improving the overall model performance, without explicitly explaining the discrepancy between the training and testing distributions.

Long-tail Learning object-detection +2

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

1 code implementation ICCV 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.

Representation Learning

LRRU: Long-short Range Recurrent Updating Networks for Depth Completion

no code implementations ICCV 2023 YuFei Wang, Bo Li, Ge Zhang, Qi Liu, Tao Gao, Yuchao Dai

Existing deep learning-based depth completion methods generally employ massive stacked layers to predict the dense depth map from sparse input data.

Depth Completion

Masked Representation Learning for Domain Generalized Stereo Matching

no code implementations CVPR 2023 Zhibo Rao, Bangshu Xiong, Mingyi He, Yuchao Dai, Renjie He, Zhelun Shen, Xing Li

Experimental results on multi-datasets show that: (1) our method can be easily plugged into the current various stereo matching models to improve generalization performance; (2) our method can reduce the significant volatility of generalization performance among different training epochs; (3) we find that the current methods prefer to choose the best results among different training epochs as generalization performance, but it is impossible to select the best performance by ground truth in practice.

Image Reconstruction Multi-Task Learning +2

Efficient LiDAR Point Cloud Oversegmentation Network

no code implementations ICCV 2023 Le Hui, Linghua Tang, Yuchao Dai, Jin Xie, Jian Yang

Then, to generate homogeneous superpoints from the sparse LiDAR point cloud, we propose a LiDAR point grouping algorithm that simultaneously considers the similarity of point embeddings and the Euclidean distance of points in 3D space.

LIDAR Semantic Segmentation Semantic Segmentation

Learning Dense and Continuous Optical Flow from an Event Camera

1 code implementation16 Nov 2022 Zhexiong Wan, Yuchao Dai, Yuxin Mao

In this paper, we propose a novel deep learning-based dense and continuous optical flow estimation framework from a single image with event streams, which facilitates the accurate perception of high-speed motion.

Optical Flow Estimation

CU-Net: LiDAR Depth-Only Completion With Coupled U-Net

1 code implementation26 Oct 2022 YuFei Wang, Yuchao Dai, Qi Liu, Peng Yang, Jiadai Sun, Bo Li

We find that existing depth-only methods can obtain satisfactory results in the areas where the measurement points are almost accurate and evenly distributed (denoted as normal areas), while the performance is limited in the areas where the foreground and background points are overlapped due to occlusion (denoted as overlap areas) and the areas where there are no measurement points around (denoted as blank areas) since the methods have no reliable input information in these areas.

Searching Dense Point Correspondences via Permutation Matrix Learning

no code implementations26 Oct 2022 Zhiyuan Zhang, Jiadai Sun, Yuchao Dai, Bin Fan, Qi Liu

In response, this paper presents a novel end-to-end learning-based method to estimate the dense correspondence of 3D point clouds, in which the problem of point matching is formulated as a zero-one assignment problem to achieve a permutation matching matrix to implement the one-to-one principle fundamentally.

Learning a Task-specific Descriptor for Robust Matching of 3D Point Clouds

no code implementations26 Oct 2022 Zhiyuan Zhang, Yuchao Dai, Bin Fan, Jiadai Sun, Mingyi He

In this paper, we propose to learn a robust task-specific feature descriptor to consistently describe the correct point correspondence under interference.

Linear Video Transformer with Feature Fixation

no code implementations15 Oct 2022 Kaiyue Lu, Zexiang Liu, Jianyuan Wang, Weixuan Sun, Zhen Qin, Dong Li, Xuyang Shen, Hui Deng, Xiaodong Han, Yuchao Dai, Yiran Zhong

Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.

Feature Importance Video Classification

Deep Idempotent Network for Efficient Single Image Blind Deblurring

no code implementations13 Oct 2022 Yuxin Mao, Zhexiong Wan, Yuchao Dai, Xin Yu

Single image blind deblurring is highly ill-posed as neither the latent sharp image nor the blur kernel is known.

Single-Image Blind Deblurring

Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation

1 code implementation5 Jul 2022 Jiadai Sun, Yuchao Dai, Xianjing Zhang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations and reduce the artifacts on the borders of the objects.

Autonomous Driving Semantic Segmentation

Context-Aware Video Reconstruction for Rolling Shutter Cameras

1 code implementation CVPR 2022 Bin Fan, Yuchao Dai, Zhiyuan Zhang, Qi Liu, Mingyi He

Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames at arbitrary times.

Motion Compensation Video Reconstruction

Towards Deeper Understanding of Camouflaged Object Detection

1 code implementation23 May 2022 Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Nick Barnes, Deng-Ping Fan

With the above understanding about camouflaged objects, we present the first triple-task learning framework to simultaneously localize, segment, and rank camouflaged objects, indicating the conspicuousness level of camouflage.

object-detection Object Detection

Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective

no code implementations10 Apr 2022 Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li

In this paper, we propose to model deep NRSfM from a sequence-to-sequence translation perspective, where the input 2D frame sequence is taken as a whole to reconstruct the deforming 3D non-rigid shape sequence.

3D Reconstruction Translation

VRNet: Learning the Rectified Virtual Corresponding Points for 3D Point Cloud Registration

no code implementations24 Mar 2022 Zhiyuan Zhang, Jiadai Sun, Yuchao Dai, Bin Fan, Mingyi He

3D point cloud registration is fragile to outliers, which are labeled as the points without corresponding points.

Point Cloud Registration

A Representation Separation Perspective to Correspondences-free Unsupervised 3D Point Cloud Registration

no code implementations24 Mar 2022 Zhiyuan Zhang, Jiadai Sun, Yuchao Dai, Dingfu Zhou, Xibin Song, Mingyi He

Existing correspondences-free methods generally learn the holistic representation of the entire point cloud, which is fragile for partial and noisy point clouds.

Point Cloud Registration

Efficient Multi-View Stereo by Iterative Dynamic Cost Volume

1 code implementation CVPR 2022 Shaoqian Wang, Bo Li, Yuchao Dai

Specifically, a lightweight 3D CNN is utilized to generate the coarsest initial depth map which is essential to launch the GRU and guarantee a fast convergence.

MUNet: Motion Uncertainty-aware Semi-supervised Video Object Segmentation

no code implementations29 Nov 2021 Jiadai Sun, Yuxin Mao, Yuchao Dai, Yiran Zhong, Jianyuan Wang

The task of semi-supervised video object segmentation (VOS) has been greatly advanced and state-of-the-art performance has been made by dense matching-based methods.

Semantic Segmentation Semi-Supervised Video Object Segmentation +1

A General Divergence Modeling Strategy for Salient Object Detection

no code implementations23 Nov 2021 Xinyu Tian, Jing Zhang, Yuchao Dai

Given multiple saliency annotations, we introduce a general divergence modeling strategy via random sampling, and apply our strategy to an ensemble based framework and three latent variable model based solutions to explore the subjective nature of saliency.

object-detection Object Detection +1

Dense Uncertainty Estimation via an Ensemble-based Conditional Latent Variable Model

no code implementations22 Nov 2021 Jing Zhang, Yuchao Dai, Mehrtash Harandi, Yiran Zhong, Nick Barnes, Richard Hartley

Uncertainty estimation has been extensively studied in recent literature, which can usually be classified as aleatoric uncertainty and epistemic uncertainty.

object-detection Object Detection

End-to-end Learning the Partial Permutation Matrix for Robust 3D Point Cloud Registration

no code implementations28 Oct 2021 Zhiyuan Zhang, Jiadai Sun, Yuchao Dai, Dingfu Zhou, Xibin Song, Mingyi He

Even though considerable progress has been made in deep learning-based 3D point cloud processing, how to obtain accurate correspondences for robust registration remains a major challenge because existing hard assignment methods cannot deal with outliers naturally.

Point Cloud Registration

Dense Uncertainty Estimation

1 code implementation13 Oct 2021 Jing Zhang, Yuchao Dai, Mochu Xiang, Deng-Ping Fan, Peyman Moghadam, Mingyi He, Christian Walder, Kaihao Zhang, Mehrtash Harandi, Nick Barnes

Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks. The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing.

Decision Making

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

1 code implementation ICCV 2021 Jing Zhang, Deng-Ping Fan, Yuchao Dai, Xin Yu, Yiran Zhong, Nick Barnes, Ling Shao

In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.

Saliency Detection Thermal Image Segmentation

PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion

no code implementations ICCV 2021 Haitian Zeng, Yuchao Dai, Xin Yu, Xiaohan Wang, Yi Yang

As NRSfM is a highly under-constrained problem, we propose two new pairwise regularization to further regularize the reconstruction.

SUNet: Symmetric Undistortion Network for Rolling Shutter Correction

1 code implementation ICCV 2021 Bin Fan, Yuchao Dai, Mingyi He

The vast majority of modern consumer-grade cameras employ a rolling shutter mechanism, leading to image distortions if the camera moves during image acquisition.

Rolling Shutter Correction

Complementary Patch for Weakly Supervised Semantic Segmentation

1 code implementation ICCV 2021 Fei Zhang, Chaochen Gu, Chenyue Zhang, Yuchao Dai

Therefore, a CAM with more information related to object seeds can be obtained by narrowing down the gap between the sum of CAMs generated by the CP Pair and the original CAM.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Exploring Depth Contribution for Camouflaged Object Detection

no code implementations24 Jun 2021 Mochu Xiang, Jing Zhang, Yunqiu Lv, Aixuan Li, Yiran Zhong, Yuchao Dai

In this paper, we study the depth contribution for camouflaged object detection, where the depth maps are generated with existing monocular depth estimation (MDE) methods.

Monocular Depth Estimation object-detection +3

Generative Transformer for Accurate and Reliable Salient Object Detection

2 code implementations20 Apr 2021 Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv, Xinyu Tian, Deng-Ping Fan, Nick Barnes

For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks.

Camouflaged Object Segmentation Machine Translation +5

CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching

3 code implementations CVPR 2021 Zhelun Shen, Yuchao Dai, Zhibo Rao

In this paper, we propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network.

Disparity Estimation Stereo Matching

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

2 code implementations CVPR 2021 Aixuan Li, Jing Zhang, Yunqiu Lv, Bowen Liu, Tong Zhang, Yuchao Dai

Visual salient object detection (SOD) aims at finding the salient object(s) that attract human attention, while camouflaged object detection (COD) on the contrary intends to discover the camouflaged object(s) that hidden in the surrounding.

object-detection Object Detection +1

Simultaneously Localize, Segment and Rank the Camouflaged Objects

1 code implementation CVPR 2021 Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Bowen Liu, Nick Barnes, Deng-Ping Fan

With the above understanding about camouflaged objects, we present the first ranking based COD network (Rank-Net) to simultaneously localize, segment and rank camouflaged objects.

object-detection Object Detection

Inverting a Rolling Shutter Camera: Bring Rolling Shutter Images to High Framerate Global Shutter Video

no code implementations ICCV 2021 Bin Fan, Yuchao Dai

In this paper, we propose to invert the above RS imaging mechanism, i. e., recovering a high framerate GS video from consecutive RS images to achieve RS temporal super-resolution (RSSR).

Optical Flow Estimation Super-Resolution

Neural Image Compression via Attentional Multi-Scale Back Projection and Frequency Decomposition

no code implementations ICCV 2021 Ge Gao, Pei You, Rong pan, Shunyuan Han, Yuanyuan Zhang, Yuchao Dai, Hojae Lee

In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts.

Image Compression MS-SSIM +1

UASNet: Uncertainty Adaptive Sampling Network for Deep Stereo Matching

no code implementations ICCV 2021 Yamin Mao, Zhihua Liu, Weiming Li, Yuchao Dai, Qiang Wang, Yun-Tae Kim, Hong-Seok Lee

Extensive experiments show that the proposed method achieves the highest ground truth covering ratio compared with other cascade cost volume based stereo matching methods.

Stereo Matching

Class Attention Network for Semantic Segmentation of Remote Sensing Images

no code implementations31 Dec 2020 Zhibo Rao, Mingyi He, Yuchao Dai

In this paper, we proposed a novel class attention module and decomposition-fusion strategy to cope with imbalanced labels.

Scene Parsing Semantic Segmentation

Uncertainty-Aware Deep Calibrated Salient Object Detection

no code implementations10 Dec 2020 Jing Zhang, Yuchao Dai, Xin Yu, Mehrtash Harandi, Nick Barnes, Richard Hartley

Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy.

object-detection Object Detection +1

Depth Completion using Piecewise Planar Model

no code implementations6 Dec 2020 Yiran Zhong, Yuchao Dai, Hongdong Li

More specifically, we represent the desired depth map as a collection of 3D planar and the reconstruction problem is formulated as the optimization of planar parameters.

Depth Completion Visual Odometry

Efficient Depth Completion Using Learned Bases

no code implementations2 Dec 2020 Yiran Zhong, Yuchao Dai, Hongdong Li

The given sparse depth points are served as a data term to constrain the weighting process.

Depth Completion

Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation

2 code implementations NeurIPS 2020 Jianyuan Wang, Yiran Zhong, Yuchao Dai, Kaihao Zhang, Pan Ji, Hongdong Li

Learning matching costs has been shown to be critical to the success of the state-of-the-art deep stereo matching methods, in which 3D convolutions are applied on a 4D feature volume to learn a 3D cost volume.

Optical Flow Estimation Stereo Matching

Hierarchical Neural Architecture Search for Deep Stereo Matching

1 code implementation NeurIPS 2020 Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Tom Drummond, Hongdong Li, ZongYuan Ge

To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation.

Neural Architecture Search Semantic Segmentation +3

PRAFlow_RVC: Pyramid Recurrent All-Pairs Field Transforms for Optical Flow Estimation in Robust Vision Challenge 2020

no code implementations14 Sep 2020 Zhexiong Wan, Yuxin Mao, Yuchao Dai

Optical flow estimation is an important computer vision task, which aims at estimating the dense correspondences between two frames.

Optical Flow Estimation

Uncertainty Inspired RGB-D Saliency Detection

4 code implementations7 Sep 2020 Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Saleh, Sadegh Aliakbarian, Nick Barnes

Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution.

RGB-D Salient Object Detection RGB Salient Object Detection +1

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

2 code implementations23 Jun 2020 Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, Liangjun Zhang

First, we construct combination volumes on the upper levels of the pyramid and develop a cost volume fusion module to integrate them for initial disparity estimation.

Disparity Estimation Domain Generalization +1

Dense Non-Rigid Structure from Motion: A Manifold Viewpoint

no code implementations15 Jun 2020 Suryansh Kumar, Luc van Gool, Carlos E. P. de Oliveira, Anoop Cherian, Yuchao Dai, Hongdong Li

Assuming that a deforming shape is composed of a union of local linear subspace and, span a global low-rank space over multiple frames enables us to efficiently model complex non-rigid deformations.


Relative Pose Estimation for Stereo Rolling Shutter Cameras

no code implementations14 Jun 2020 Ke Wang, Bin Fan, Yuchao Dai

In this paper, we present a novel linear algorithm to estimate the 6 DoF relative pose from consecutive frames of stereo rolling shutter (RS) cameras.

Pose Estimation

Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution

no code implementations CVPR 2020 Xibin Song, Yuchao Dai, Dingfu Zhou, Liu Liu, Wei Li, Hongdng Li, Ruigang Yang

Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively re-exploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss.

Benchmarking Depth Map Super-Resolution

Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene

no code implementations19 Nov 2019 Suryansh Kumar, Yuchao Dai, Hongdong Li

We assume that a dynamic scene can be approximated by numerous piecewise planar surfaces, where each planar surface enjoys its own rigid motion, and the global change in the scene between two frames is as-rigid-as-possible (ARAP).

3D Reconstruction

Joint Stereo Video Deblurring, Scene Flow Estimation and Moving Object Segmentation

no code implementations6 Oct 2019 Liyuan Pan, Yuchao Dai, Miaomiao Liu, Fatih Porikli, Quan Pan

Under our model, these three tasks are naturally connected and expressed as the parameter estimation of 3D scene structure and camera motion (structure and motion for the dynamic scenes).

Deblurring Scene Flow Estimation +1

MVS^2: Deep Unsupervised Multi-view Stereo with Multi-View Symmetry

no code implementations30 Aug 2019 Yuchao Dai, Zhidong Zhu, Zhibo Rao, Bo Li

The success of existing deep-learning based multi-view stereo (MVS) approaches greatly depends on the availability of large-scale supervision in the form of dense depth maps.


IoU Loss for 2D/3D Object Detection

1 code implementation11 Aug 2019 Dingfu Zhou, Jin Fang, Xibin Song, Chenye Guan, Junbo Yin, Yuchao Dai, Ruigang Yang

In 2D/3D object detection task, Intersection-over-Union (IoU) has been widely employed as an evaluation metric to evaluate the performance of different detectors in the testing stage.

2D Object Detection 3D Object Detection +1

MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching

no code implementations25 Apr 2019 Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He

The multi-scale residual 3D convolution module learns the different scale geometry context from the cost volume which aggregated by the multi-scale fusion 2D convolution module.

Autonomous Driving object-detection +3

Multi-scale Cross-form Pyramid Network for Stereo Matching

no code implementations25 Apr 2019 Zhidong Zhu, Mingyi He, Yuchao Dai, Zhibo Rao, Bo Li

The network consists of three modules: Multi-Scale 2D local feature extraction module, Cross-form spatial pyramid module and Multi-Scale 3D Feature Matching and Fusion module.

3D Feature Matching 3D Scene Reconstruction +3

Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes

no code implementations CVPR 2019 Yiran Zhong, Pan Ji, Jianyuan Wang, Yuchao Dai, Hongdong Li

In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning.

Benchmarking Optical Flow Estimation

High Frame Rate Video Reconstruction based on an Event Camera

1 code implementation12 Mar 2019 Liyuan Pan, Richard Hartley, Cedric Scheerlinck, Miaomiao Liu, Xin Yu, Yuchao Dai

Based on the abundant event data alongside a low frame rate, easily blurred images, we propose a simple yet effective approach to reconstruct high-quality and high frame rate sharp videos.

Video Generation Video Reconstruction +1

Ground Plane based Absolute Scale Estimation for Monocular Visual Odometry

no code implementations3 Mar 2019 Dingfu Zhou, Yuchao Dai, Hongdong Li

Recovering the absolute metric scale from a monocular camera is a challenging but highly desirable problem for monocular camera-based systems.

Monocular Visual Odometry

Single Image Deblurring and Camera Motion Estimation with Depth Map

no code implementations1 Mar 2019 Liyuan Pan, Yuchao Dai, Miaomiao Liu

Camera shake during exposure is a major problem in hand-held photography, as it causes image blur that destroys details in the captured images.~In the real world, such blur is mainly caused by both the camera motion and the complex scene structure.~While considerable existing approaches have been proposed based on various assumptions regarding the scene structure or the camera motion, few existing methods could handle the real 6 DoF camera motion.~In this paper, we propose to jointly estimate the 6 DoF camera motion and remove the non-uniform blur caused by camera motion by exploiting their underlying geometric relationships, with a single blurry image and its depth map (either direct depth measurements, or a learned depth map) as input.~We formulate our joint deblurring and 6 DoF camera motion estimation as an energy minimization problem which is solved in an alternative manner.

Deblurring Image Deblurring +1

Dense Depth Estimation of a Complex Dynamic Scene without Explicit 3D Motion Estimation

no code implementations11 Feb 2019 Suryansh Kumar, Ram Srivatsav Ghorakavi, Yuchao Dai, Hongdong Li

Given per-pixel optical flow correspondences between two consecutive frames and, the sparse depth prior for the reference frame, we show that, we can effectively recover the dense depth map for the successive frames without solving for 3D motion parameters.

Depth Estimation Motion Estimation +1

ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving

no code implementations CVPR 2019 Xibin Song, Peng Wang, Dingfu Zhou, Rui Zhu, Chenye Guan, Yuchao Dai, Hao Su, Hongdong Li, Ruigang Yang

Specifically, we first segment each car with a pre-trained Mask R-CNN, and then regress towards its 3D pose and shape based on a deformable 3D car model with or without using semantic keypoints.

3D Car Instance Understanding Autonomous Driving

Bringing a Blurry Frame Alive at High Frame-Rate with an Event Camera

1 code implementation CVPR 2019 Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, Yuchao Dai

In this paper, we propose a simple and effective approach, the \textbf{Event-based Double Integral (EDI)} model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data.

Video Generation

Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization

5 code implementations ICCV 2019 Liu Liu, Hongdong Li, Yuchao Dai

This paper tackles the problem of large-scale image-based localization (IBL) where the spatial location of a query image is determined by finding out the most similar reference images in a large database.

Image-Based Localization Representation Learning +1

Stereo Computation for a Single Mixture Image

no code implementations ECCV 2018 Yiran Zhong, Yuchao Dai, Hongdong Li

This paper proposes an original problem of \emph{stereo computation from a single mixture image}-- a challenging problem that had not been researched before.

Stereo Matching Stereo Matching Hand

Deeply Supervised Depth Map Super-Resolution as Novel View Synthesis

no code implementations27 Aug 2018 Xibin Song, Yuchao Dai, Xueying Qin

However, there still exist two major issues with these DCNN based depth map super-resolution methods that hinder the performance: i) The low-resolution depth maps either need to be up-sampled before feeding into the network or substantial deconvolution has to be used; and ii) The supervision (high-resolution depth maps) is only applied at the end of the network, thus it is difficult to handle large up-sampling factors, such as $\times 8, \times 16$.

Benchmarking Blocking +2

3D Geometry-Aware Semantic Labeling of Outdoor Street Scenes

no code implementations13 Aug 2018 Yiran Zhong, Yuchao Dai, Hongdong Li

This paper is concerned with the problem of how to better exploit 3D geometric information for dense semantic image labeling.

Open-World Stereo Video Matching with Deep RNN

no code implementations ECCV 2018 Yiran Zhong, Hongdong Li, Yuchao Dai

Deep Learning based stereo matching methods have shown great successes and achieved top scores across different benchmarks.

Stereo Matching Stereo Matching Hand

Occluded Joints Recovery in 3D Human Pose Estimation based on Distance Matrix

no code implementations30 Jul 2018 Xiang Guo, Yuchao Dai

In this paper, we propose to address the problem of single image 3D human pose estimation with occluded measurements by exploiting the Euclidean distance matrix (EDM).

3D Human Pose Estimation

Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective

no code implementations CVPR 2018 Suryansh Kumar, Anoop Cherian, Yuchao Dai, Hongdong Li

To address these issues, in this paper, we propose a new approach for dense NRSfM by modeling the problem on a Grassmann manifold.

Depth Map Completion by Jointly Exploiting Blurry Color Images and Sparse Depth Maps

no code implementations27 Nov 2017 Liyuan Pan, Yuchao Dai, Miaomiao Liu, Fatih Porikli

In this paper, we propose to tackle the problem of depth map completion by jointly exploiting the blurry color image sequences and the sparse depth map measurements, and present an energy minimization based formulation to simultaneously complete the depth maps, estimate the scene flow and deblur the color images.

Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map

no code implementations ICCV 2017 Liu Liu, Hongdong Li, Yuchao Dai

In this paper, we introduce a global method which harnesses global contextual information exhibited both within the query image and among all the 3D points in the map.

3D Feature Matching Camera Localization

Self-Supervised Learning for Stereo Matching with Self-Improving Ability

no code implementations4 Sep 2017 Yiran Zhong, Yuchao Dai, Hongdong Li

Exiting deep-learning based dense stereo matching methods often rely on ground-truth disparity maps as the training signals, which are however not always available in many situations.

Self-Supervised Learning Stereo Matching +1

Deep Edge-Aware Saliency Detection

no code implementations15 Aug 2017 Jing Zhang, Yuchao Dai, Fatih Porikli, Mingyi He

There has been profound progress in visual saliency thanks to the deep learning architectures, however, there still exist three major challenges that hinder the detection performance for scenes with complex compositions, multiple salient objects, and salient objects of diverse scales.

Descriptive Saliency Detection

Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference

1 code implementation2 Aug 2017 Bo Li, Yuchao Dai, Mingyi He

Extensive experiments on the NYU Depth V2 and KITTI datasets show the superiority of our method compared with current state-of-the-art methods.

Monocular Depth Estimation Quantization +1

Pixel-variant Local Homography for Fisheye Stereo Rectification Minimizing Resampling Distortion

no code implementations12 Jul 2017 Dingfu Zhou, Yuchao Dai, Hongdong Li

First, we prove that there indeed exist enough degrees of freedom to apply pixel-wise local homography for stereo rectification.

3D Reconstruction Stereo Matching +1

Dense Non-rigid Structure-from-Motion Made Easy - A Spatial-Temporal Smoothness based Solution

no code implementations27 Jun 2017 Yuchao Dai, Huizhong Deng, Mingyi He

Second, we propose to exploit the spatial smoothness by resorting to the Laplacian of the 3D non-rigid shape.

Integrated Deep and Shallow Networks for Salient Object Detection

no code implementations2 Jun 2017 Jing Zhang, Bo Li, Yuchao Dai, Fatih Porikli, Mingyi He

Then the results from deep FCNN and RBD are concatenated to feed into a shallow network to map the concatenated feature maps to saliency maps.

object-detection RGB Salient Object Detection +2

Spatial-Temporal Union of Subspaces for Multi-body Non-rigid Structure-from-Motion

no code implementations14 May 2017 Suryansh Kumar, Yuchao Dai, Hongdong Li

This spatio-temporal representation not only provides competitive 3D reconstruction but also outputs robust segmentation of multiple non-rigid objects.

3D Reconstruction

Single image depth estimation by dilated deep residual convolutional neural network and soft-weight-sum inference

1 code implementation27 Apr 2017 Bo Li, Yuchao Dai, Huahui Chen, Mingyi He

This paper proposes a new residual convolutional neural network (CNN) architecture for single image depth estimation.

Depth Estimation

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

no code implementations19 Apr 2017 Bo Li, Mingyi He, Xuelian Cheng, Yu-cheng Chen, Yuchao Dai

Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method.

Action Recognition Image Classification +3

Skeleton Boxes: Solving skeleton based action detection with a single deep convolutional neural network

no code implementations19 Apr 2017 Bo Li, Huahui Chen, Yu-cheng Chen, Yuchao Dai, Mingyi He

However, due to the difficulty in representing the 3D skeleton video and the lack of training data, action detection from streaming 3D skeleton video still lags far behind its recognition counterpart and image based object detection.

Action Detection Action Recognition +3

Simultaneous Stereo Video Deblurring and Scene Flow Estimation

no code implementations CVPR 2017 Liyuan Pan, Yuchao Dai, Miaomiao Liu, Fatih Porikli

Unlike the existing approach [31] which used a pre-computed scene flow, we propose a single framework to jointly estimate the scene flow and deblur the image, where the motion cues from scene flow estimation and blur information could reinforce each other, and produce superior results than the conventional scene flow estimation or stereo deblurring methods.

Deblurring Scene Flow Estimation

Multi-body Non-rigid Structure-from-Motion

no code implementations15 Jul 2016 Suryansh Kumar, Yuchao Dai, Hongdong Li

Recent progress have extended SFM to the areas of {multi-body SFM} (where there are {multiple rigid} relative motions in the scene), as well as {non-rigid SFM} (where there is a single non-rigid, deformable object or scene).

3D Reconstruction Clustering

Deep Depth Super-Resolution : Learning Depth Super-Resolution using Deep Convolutional Neural Network

no code implementations7 Jul 2016 Xibin Song, Yuchao Dai, Xueying Qin

In this paper, we bridge up the gap and extend the success of deep convolutional neural network to depth super-resolution.

Image Super-Resolution

Robust and Efficient Relative Pose with a Multi-camera System for Autonomous Vehicle in Highly Dynamic Environments

no code implementations12 May 2016 Liu Liu, Hongdong Li, Yuchao Dai

When the solver is used in combination with RANSAC, we are able to quickly prune unpromising hypotheses, significantly improve the chance of finding inliers.

Motion Estimation

Robust Optical Flow Estimation of Double-Layer Images under Transparency or Reflection

no code implementations CVPR 2016 Jiaolong Yang, Hongdong Li, Yuchao Dai, Robby T. Tan

This paper deals with a challenging, frequently encountered, yet not properly investigated problem in two-frame optical flow estimation.

Optical Flow Estimation

Rolling Shutter Camera Relative Pose: Generalized Epipolar Geometry

no code implementations CVPR 2016 Yuchao Dai, Hongdong Li, Laurent Kneip

The vast majority of modern consumer-grade cameras employ a rolling shutter mechanism.

Cannot find the paper you are looking for? You can Submit a new open access paper.