1 code implementation • 5 Sep 2024 • Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang
To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios.
no code implementations • 26 Jul 2024 • Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang
Recent advancements in 3D object reconstruction from single images have primarily focused on improving the accuracy of object shapes.
1 code implementation • CVPR 2024 • Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning.
Ranked #2 on Question Answering on SQA3D
2 code implementations • 19 Oct 2023 • Ziqi Pang, Ziyang Xie, Yunze Man, Yu-Xiong Wang
This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language.
Ranked #3 on Question Answering on SQA3D
no code implementations • 5 May 2023 • Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain.
no code implementations • CVPR 2023 • Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
We design a BEV-guided multi-sensor attention block to take queries from BEV embeddings and learn the BEV representation from sensor-specific features.
no code implementations • 4 Dec 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo
Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$.
no code implementations • ICCV 2021 • Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani
LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection.
1 code implementation • 8 Jul 2021 • Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani
To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes.
no code implementations • 1 Jan 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Danyang Zhuo
Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving.
no code implementations • 20 Aug 2020 • Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
3D Multi-object tracking (MOT) is crucial to autonomous systems.
1 code implementation • 12 Jun 2020 • Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.
no code implementations • 19 Apr 2019 • Yunze Man, Yangsibo Huang, Junyi Feng, Xi Li, Fei Wu
Segmentation of pancreas is important for medical image analysis, yet it faces great challenges of class imbalance, background distractions and non-rigid geometrical features.
no code implementations • 17 Nov 2018 • Yunze Man, Xinshuo Weng, Xi Li, Kris Kitani
We focus on estimating the 3D orientation of the ground plane from a single image.