Search Results for author: Yunze Man

Found 14 papers, 5 papers with code

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

1 code implementation5 Sep 2024 Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang

To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios.

Scene Understanding Visual Grounding

Floating No More: Object-Ground Reconstruction from a Single Image

no code implementations26 Jul 2024 Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang

Recent advancements in 3D object reconstruction from single images have primarily focused on improving the accuracy of object shapes.

3D Object Reconstruction 3D Reconstruction +1

Situational Awareness Matters in 3D Vision Language Reasoning

1 code implementation CVPR 2024 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning.

Question Answering

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

2 code implementations19 Oct 2023 Ziqi Pang, Ziyang Xie, Yunze Man, Yu-Xiong Wang

This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language.

Action Recognition Image-text Retrieval +5

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

no code implementations5 May 2023 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain.

Domain Adaptation

BEV-Guided Multi-Modality Fusion for Driving Perception

no code implementations CVPR 2023 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

We design a BEV-guided multi-sensor attention block to take queries from BEV embeddings and learn the BEV representation from sensor-specific features.

Autonomous Driving Representation Learning

Fast Graph Neural Tangent Kernel via Kronecker Sketching

no code implementations4 Dec 2021 Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo

Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$.

regression

Multi-Echo LiDAR for 3D Object Detection

no code implementations ICCV 2021 Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani

LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection.

3D Object Detection Graph Neural Network +2

Multi-Modality Task Cascade for 3D Object Detection

1 code implementation8 Jul 2021 Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani

To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes.

3D Object Detection Object +3

Graph Neural Network Acceleration via Matrix Dimension Reduction

no code implementations1 Jan 2021 Shunhua Jiang, Yunze Man, Zhao Song, Danyang Zhuo

Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving.

Dimensionality Reduction Graph Neural Network

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

1 code implementation12 Jun 2020 Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani

As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.

3D Multi-Object Tracking Graph Neural Network +1

Deep Q Learning Driven CT Pancreas Segmentation with Geometry-Aware U-Net

no code implementations19 Apr 2019 Yunze Man, Yangsibo Huang, Junyi Feng, Xi Li, Fei Wu

Segmentation of pancreas is important for medical image analysis, yet it faces great challenges of class imbalance, background distractions and non-rigid geometrical features.

Pancreas Segmentation Q-Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.