TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

1 code implementation28 Mar 2024 Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao

However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the domain gap between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the lack of data with comprehensive box-caption pair annotations specifically tailored for outdoor scenes.

3D dense captioning Dense Captioning

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

no code implementations CVPR 2024 Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, YuFei Wang, Zhenyu Zhang, Jun Li, Jian Yang

Depth completion is a vital task for autonomous driving, as it involves reconstructing the precise 3D geometry of a scene from sparse and noisy depth measurements.

3D geometry Autonomous Driving +1

GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping

1 code implementation14 Mar 2024 Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang

In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models.

Contrastive Learning Robotic Grasping +1

MonoOcc: Digging into Monocular Semantic Occupancy Prediction

1 code implementation13 Mar 2024 Yupeng Zheng, Xiang Li, Pengfei Li, Yuhang Zheng, Bu Jin, Chengliang Zhong, Xiaoxiao Long, Hao Zhao, Qichao Zhang

However, existing methods rely on a complex cascaded framework with relatively limited information to restore 3D scenes, including a dependency on supervision solely on the whole network's output, single-frame input, and the utilization of a small backbone.

3D geometry Autonomous Vehicles

Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images

no code implementations8 Feb 2024 Xiaoxiao Long, Yuhang Zheng, Yupeng Zheng, Beiwen Tian, Cheng Lin, Lingjie Liu, Hao Zhao, Guyue Zhou, Wenping Wang

We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context.

3D geometry Depth Estimation

Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

no code implementations10 Jan 2024 Beiwen Tian, Huan-ang Gao, Leiyao Cui, Yupeng Zheng, Lan Luo, Baofeng Wang, Rong Zhi, Guyue Zhou, Hao Zhao

We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.

Anomaly Segmentation Autonomous Driving +3

3D Implicit Transporter for Temporally Consistent Keypoint Discovery

1 code implementation ICCV 2023 Chengliang Zhong, Yuhang Zheng, Yupeng Zheng, Hao Zhao, Li Yi, Xiaodong Mu, Ling Wang, Pengfei Li, Guyue Zhou, Chao Yang, Xinliang Zhang, Jian Zhao

To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information.

Learnable Differencing Center for Nighttime Depth Perception

1 code implementation26 Jun 2023 Zhiqiang Yan, Yupeng Zheng, Chongyi Li, Jun Li, Jian Yang

Depth completion is the task of recovering dense depth maps from sparse ones, usually with the help of color images.

Depth Completion Depth Estimation

DPF: Learning Dense Prediction Fields with Weak Supervision

1 code implementation CVPR 2023 Xiaoxue Chen, Yuhang Zheng, Yupeng Zheng, Qiang Zhou, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

We showcase the effectiveness of DPFs using two substantially different tasks: high-level semantic parsing and low-level intrinsic image decomposition.

Intrinsic Image Decomposition Scene Understanding +1

ADAPT: Action-aware Driving Caption Transformer

1 code implementation1 Feb 2023 Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, Jingjing Liu

To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.

Autonomous Driving Decision Making

