1 code implementation • 26 Nov 2024 • Junyuan Deng, Wei Yin, Xiaoyang Guo, Qian Zhang, Xiaotao Hu, Weiqiang Ren, Xiaoxiao Long, Ping Tan
Monocular camera calibration is essential for many 3D vision tasks.
1 code implementation • 29 Oct 2024 • Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
In contrast, Large Vision-Language Models (LVLMs) excel in scene understanding and reasoning.
1 code implementation • 14 Oct 2024 • Honghui Yang, Di Huang, Wei Yin, Chunhua Shen, Haifeng Liu, Xiaofei He, Binbin Lin, Wanli Ouyang, Tong He
Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results.
no code implementations • 14 Oct 2024 • Songen Gu, Wei Yin, Bu Jin, Xiaoyang Guo, Junming Wang, Haodong Li, Qian Zhang, Xiaoxiao Long
The ability of this world model to capture the evolution of the environment is crucial for planning in autonomous driving.
no code implementations • 7 Oct 2024 • Junming Wang, Xingyu Zhang, Zebin Xing, Songen Gu, Xiaoyang Guo, Yang Hu, Ziying Song, Qian Zhang, Xiaoxiao Long, Wei Yin
In this paper, we propose HE-Drive: the first human-like-centric end-to-end autonomous driving system to generate trajectories that are both temporally consistent and comfortable.
no code implementations • 30 Sep 2024 • Junming Wang, Wei Yin, Xiaoxiao Long, Xingyu Zhang, Zebin Xing, Xiaoyang Guo, Qian Zhang
In this paper, we introduce OccRWKV, an efficient semantic occupancy network inspired by Receptance Weighted Key Value (RWKV).
no code implementations • 26 Sep 2024 • Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, Ying-Cong Chen
In this paper, we provide a systemic analysis of the diffusion formulation for the dense prediction, focusing on both quality and efficiency.
1 code implementation • 27 May 2024 • Linhan Wang, Kai Cheng, Shuo Lei, Shengkun Wang, Wei Yin, Chenyang Lei, Xiaoxiao Long, Chang-Tien Lu
Dash cam videos often suffer from severe obstructions such as reflections and occlusions on the windshields, which significantly impede the application of neural rendering techniques.
1 code implementation • Under review for Transaction 2024 • Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen
For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training.
Ranked #1 on Surface Normals Estimation on NYU Depth v2 (using extra training data)
1 code implementation • 20 Mar 2024 • Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma
Language-guided scene-aware human motion generation has great significance for entertainment and robotics.
1 code implementation • 18 Mar 2024 • Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e. g., depth and normals, from single images.
1 code implementation • CVPR 2024 • Junda Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang
In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings.
Ranked #1 on Monocular Depth Estimation on DDAD
no code implementations • 22 Feb 2024 • Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen
The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed.
no code implementations • 19 Feb 2024 • Jialei Xu, Wei Yin, Dong Gong, Junjun Jiang, Xianming Liu
We suggest building virtual pinhole cameras to resolve the distortion problem of fisheye cameras and unify the processing for the two types of 360$^\circ$ cameras.
1 code implementation • 16 Feb 2024 • Xuelun Shen, Zhipeng Cai, Wei Yin, Matthias Müller, Zijun Li, Kaixuan Wang, Xiaozhi Chen, Cheng Wang
Given an architecture, GIM first trains it on standard domain-specific datasets and then combines it with complementary matching methods to create dense labels on nearby frames of novel videos.
no code implementations • CVPR 2024 • Ying-Tian Liu, Yuan-Chen Guo, Guan Luo, Heyi Sun, Wei Yin, Song-Hai Zhang
However, the generation quality and generalization ability of 3D diffusion models is hindered by the scarcity of high-quality and large-scale 3D datasets.
no code implementations • 28 Nov 2023 • Kai Cheng, Xiaoxiao Long, Wei Yin, Jin Wang, Zhiqiang Wu, Yuexin Ma, Kaixuan Wang, Xiaozhi Chen, Xuejin Chen
Multi-camera setups find widespread use across various applications, such as autonomous driving, as they greatly expand sensing capabilities.
1 code implementation • 26 Nov 2023 • Junhui Yin, Wei Yin, Hao Chen, Xuqian Ren, Zhanyu Ma, Jun Guo, Yifan Liu
These priors ensure the color rendered along rays to be robust to view direction and reduce the inherent ambiguities of density estimated along rays.
no code implementations • ICCV 2023 • Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen
In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
no code implementations • ICCV 2023 • Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Zhao
3D scene reconstruction is a long-standing vision task.
1 code implementation • ICCV 2023 • Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen
State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.
Ranked #25 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
1 code implementation • CVPR 2023 • Rui Li, Dong Gong, Wei Yin, Hao Chen, Yu Zhu, Kaixuan Wang, Xiaozhi Chen, Jinqiu Sun, Yanning Zhang
To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other.
no code implementations • 14 Apr 2023 • Jaime Spencer, C. Stella Qian, Michaela Trescakova, Chris Russell, Simon Hadfield, Erich W. Graf, Wendy J. Adams, Andrew J. Schofield, James Elder, Richard Bowden, Ali Anwar, Hao Chen, Xiaozhi Chen, Kai Cheng, Yuchao Dai, Huynh Thai Hoa, Sadat Hossain, Jianmian Huang, Mohan Jing, Bo Li, Chao Li, Baojun Li, Zhiwen Liu, Stefano Mattoccia, Siegfried Mercelis, Myungwoo Nam, Matteo Poggi, Xiaohua Qi, Jiahui Ren, Yang Tang, Fabio Tosi, Linh Trinh, S. M. Nadim Uddin, Khan Muhammad Umair, Kaixuan Wang, YuFei Wang, Yixing Wang, Mochu Xiang, Guangkai Xu, Wei Yin, Jun Yu, Qi Zhang, Chaoqiang Zhao
This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC).
1 code implementation • 28 Mar 2023 • Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, Yuchao Cao
Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks.
2 code implementations • 7 Nov 2022 • Libo Sun, Jia-Wang Bian, Huangying Zhan, Wei Yin, Ian Reid, Chunhua Shen
Self-supervised monocular depth estimation has shown impressive results in static scenes.
Indoor Monocular Depth Estimation Monocular Depth Estimation +1
no code implementations • 18 Oct 2022 • Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen
In this paper, we address monocular depth estimation with deep neural networks.
Ranked #7 on Monocular Depth Estimation on ETH3D
1 code implementation • 28 Aug 2022 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen
To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
1 code implementation • 29 Jul 2022 • Guangkai Xu, Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Jia-Wang Bian
Our method leverages a data-driven prior in the form of a single image depth prediction network trained on large-scale datasets, the output of which is used as an input to our model.
no code implementations • 12 Jul 2022 • Yichen Sheng, Yifan Liu, Jianming Zhang, Wei Yin, A. Cengiz Oztireli, He Zhang, Zhe Lin, Eli Shechtman, Bedrich Benes
It can be used to calculate hard shadows in a 2D image based on the projective geometry, providing precise control of the shadows' direction and shape.
no code implementations • 5 May 2022 • Kai Cheng, Hao Chen, Wei Yin, Guangkai Xu, Xuejin Chen
However, multi-view depth estimation is fundamentally a correspondence-based optimization problem, but previous learning-based methods mainly rely on predefined depth hypotheses to build correspondence as the cost volume and implicitly regularize it to fit depth prediction, deviating from the essence of iterative optimization based on stereo correspondence.
no code implementations • 25 Apr 2022 • Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel
The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.
no code implementations • 4 Apr 2022 • Libo Sun, Wei Yin, Enze Xie, Zhengrong Li, Changming Sun, Chunhua Shen
The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.
no code implementations • CVPR 2022 • Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen, Anton Van Den Hengel
We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module.
Ranked #6 on Long-tail Learning on iNaturalist 2018
no code implementations • 4 Feb 2022 • Wei Yin, Yifan Liu, Chunhua Shen, Baichuan Sun, Anton Van Den Hengel
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets, despite not using any images therefrom.
Ranked #1 on Semantic Segmentation on WildDash
no code implementations • 3 Feb 2022 • Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Wu, Feng Zhao
However, in some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
no code implementations • 28 Jul 2021 • Libo Sun, Haokui Zhang, Wei Yin
Specifically, we exploit pseudo-LiDAR using depth estimation, and propose a feature fusion network where RGB and learned depth information are fused for improved road detection.
no code implementations • CVPR 2021 • Yifan Liu, Hao Chen, Yu Chen, Wei Yin, Chunhua Shen
We hope that this simple, extended perceptual loss may serve as a generic structured-output loss that is applicable to most structured output learning tasks.
3 code implementations • 7 Mar 2021 • Wei Yin, Yifan Liu, Chunhua Shen
In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.
1 code implementation • CVPR 2021 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen
Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.
Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)
no code implementations • 15 Oct 2020 • Jiahui Wen, Jingwei Ma, Hongkui Tu, Wei Yin, Jian Fang
At review level, we mutually propagate textual features between the user and item, and capture the informative reviews.
2 code implementations • 3 Feb 2020 • Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin
Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.
1 code implementation • 17 Sep 2019 • Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen
In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.
no code implementations • 5 Sep 2019 • Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin
The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.
3 code implementations • ICCV 2019 • Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan
Monocular depth prediction plays a crucial role in understanding 3D scene geometry.
Ranked #10 on Depth Estimation on NYU-Depth V2