1 code implementation • 21 Jun 2022 • Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong
Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.
no code implementations • 10 Apr 2022 • Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li
In this paper, we propose to model deep NRSfM from a sequence-to-sequence translation perspective, where the input 2D frame sequence is taken as a whole to reconstruct the deforming 3D non-rigid shape sequence.
no code implementations • CVPR 2022 • Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash Harandi, Tom Drummond, ZongYuan Ge
We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames.
2 code implementations • ICLR 2022 • Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong
As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.
no code implementations • 17 Dec 2021 • Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars Petersson, Hongdong Li
This work studies the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses.
1 code implementation • 6 Dec 2021 • Weixuan Sun, Jing Zhang, Zheyuan Liu, Yiran Zhong, Nick Barnes
To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel level pseudo labels.
no code implementations • 29 Nov 2021 • Jiadai Sun, Yuxin Mao, Yuchao Dai, Yiran Zhong, Jianyuan Wang
The task of semi-supervised video object segmentation (VOS) has been greatly advanced and state-of-the-art performance has been made by dense matching-based methods.
Semantic Segmentation
Semi-Supervised Video Object Segmentation
+1
no code implementations • 22 Nov 2021 • Jing Zhang, Yuchao Dai, Mehrtash Harandi, Yiran Zhong, Nick Barnes, Richard Hartley
Uncertainty estimation has been extensively studied in recent literature, which can usually be classified as aleatoric uncertainty and epistemic uncertainty.
no code implementations • 29 Sep 2021 • Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash Harandi, Tom Drummond, ZongYuan Ge
The proposed SLT-Net leverages on both short-term dynamics and long-term temporal consistency to detect concealed objects in continuous video frames.
1 code implementation • ICCV 2021 • Jing Zhang, Deng-Ping Fan, Yuchao Dai, Xin Yu, Yiran Zhong, Nick Barnes, Ling Shao
In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
1 code implementation • 1 Sep 2021 • Xiaomeng Xin, Yiran Zhong, Yunzhong Hou, Jinjun Wang, Liang Zheng
With the absence of old task images, they often assume that old knowledge is well preserved if the classifier produces similar output on new images.
no code implementations • 24 Jun 2021 • Mochu Xiang, Jing Zhang, Yunqiu Lv, Aixuan Li, Yiran Zhong, Yuchao Dai
In this paper, we study the depth contribution for camouflaged object detection, where the depth maps are generated with existing monocular depth estimation (MDE) methods.
1 code implementation • 16 Jun 2021 • Jiajun Zha, Yiran Zhong, Jing Zhang, Richard Hartley, Liang Zheng
Attention has been proved to be an efficient mechanism to capture long-range dependencies.
no code implementations • 27 May 2021 • Wenjia Niu, Kaihao Zhang, Wenhan Luo, Yiran Zhong
Single-image super-resolution (SR) and multi-frame SR are two ways to super resolve low-resolution images.
1 code implementation • CVPR 2021 • Jianyuan Wang, Yiran Zhong, Yuchao Dai, Stan Birchfield, Kaihao Zhang, Nikolai Smolyanskiy, Hongdong Li
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
Ranked #5 on
Monocular Depth Estimation
on KITTI Eigen split
2 code implementations • CVPR 2021 • Jinxing Zhou, Liang Zheng, Yiran Zhong, Shijie Hao, Meng Wang
To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed.
no code implementations • CVPR 2021 • Dongxu Li, Chenchen Xu, Kaihao Zhang, Xin Yu, Yiran Zhong, Wenqi Ren, Hanna Suominen, Hongdong Li
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
no code implementations • 6 Dec 2020 • Yiran Zhong, Yuchao Dai, Hongdong Li
More specifically, we represent the desired depth map as a collection of 3D planar and the reconstruction problem is formulated as the optimization of planar parameters.
no code implementations • 2 Dec 2020 • Yiran Zhong, Yuchao Dai, Hongdong Li
The given sparse depth points are served as a data term to constrain the weighting process.
no code implementations • 1 Dec 2020 • Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz
A common way to speed up the computation is to downsample the feature volume, but this loses high-frequency details.
2 code implementations • NeurIPS 2020 • Jianyuan Wang, Yiran Zhong, Yuchao Dai, Kaihao Zhang, Pan Ji, Hongdong Li
Learning matching costs has been shown to be critical to the success of the state-of-the-art deep stereo matching methods, in which 3D convolutions are applied on a 4D feature volume to learn a 3D cost volume.
1 code implementation • NeurIPS 2020 • Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Tom Drummond, Hongdong Li, ZongYuan Ge
To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation.
Ranked #2 on
Stereo Disparity Estimation
on Scene Flow
1 code implementation • CVPR 2020 • Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Bjorn Stenger, Wei Liu, Hongdong Li
To address this problem, we propose a new method which combines two GAN models, i. e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images.
Ranked #11 on
Deblurring
on HIDE (trained on GOPRO)
no code implementations • CVPR 2019 • Yiran Zhong, Pan Ji, Jianyuan Wang, Yuchao Dai, Hongdong Li
In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning.
3 code implementations • CVPR 2019 • Xuelian Cheng, Yiran Zhong, Yuchao Dao, Pan Ji, Hongdong Li
In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps.
no code implementations • ECCV 2018 • Yiran Zhong, Yuchao Dai, Hongdong Li
This paper proposes an original problem of \emph{stereo computation from a single mixture image}-- a challenging problem that had not been researched before.
no code implementations • 13 Aug 2018 • Yiran Zhong, Yuchao Dai, Hongdong Li
This paper is concerned with the problem of how to better exploit 3D geometric information for dense semantic image labeling.
no code implementations • ECCV 2018 • Yiran Zhong, Hongdong Li, Yuchao Dai
Deep Learning based stereo matching methods have shown great successes and achieved top scores across different benchmarks.
1 code implementation • 28 Mar 2018 • Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Wei Liu, Hongdong Li
To tackle the second challenge, we leverage the developed DBLRNet as a generator in the GAN (generative adversarial network) architecture, and employ a content loss in addition to an adversarial loss for efficient adversarial training.
no code implementations • 4 Sep 2017 • Yiran Zhong, Yuchao Dai, Hongdong Li
Exiting deep-learning based dense stereo matching methods often rely on ground-truth disparity maps as the training signals, which are however not always available in many situations.
no code implementations • CVPR 2016 • Pan Ji, Hongdong Li, Mathieu Salzmann, Yiran Zhong
Feature tracking is a fundamental problem in computer vision, with applications in many computer vision tasks, such as visual SLAM and action recognition.