Search Results for author: Wufei Ma

Found 22 papers, 7 papers with code

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

no code implementations CVPR 2025 Wufei Ma, Luoxin Ye, Nessa McWeeney, Celso M de Melo, Alan Yuille, Jieneng Chen

In this paper, we systematically study the impact of 3D-informed data, architecture, and training setups, introducing SpatialLLM, a large multi-modal model with advanced 3D spatial reasoning abilities.

Spatial Reasoning Visual Question Answering (VQA)

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

no code implementations28 Apr 2025 Wufei Ma, Yu-Cheng Chou, Qihao Liu, Xingrui Wang, Celso de Melo, Jianwen Xie, Alan Yuille

Despite recent advances on multi-modal models, 3D spatial reasoning remains a challenging task for state-of-the-art open-source and proprietary models.

Question Answering Spatial Reasoning

DINeMo: Learning Neural Mesh Models with no 3D Annotations

no code implementations26 Mar 2025 Weijie Guo, Guofeng Zhang, Wufei Ma, Alan Yuille

In this work, we present DINeMo, a novel neural mesh model that is trained with no 3D annotations by leveraging pseudo-correspondence obtained from large visual foundation models.

3D Pose Estimation 6D Pose Estimation +2

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

1 code implementation12 Feb 2025 Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M de Melo, Jieneng Chen, Alan Yuille

Although large multimodal models (LMMs) have demonstrated remarkable capabilities in visual scene interpretation and reasoning, their capacity for complex and precise 3-dimensional spatial reasoning remains uncertain.

Attribute Diagnostic +2

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

no code implementations CVPR 2025 Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M de Melo, Jieneng Chen, Alan Yuille

Although large multimodal models (LMMs) have demonstrated remarkable capabilities in visual scene interpretation and reasoning, their capacity for complex and precise 3-dimensional spatial reasoning remains uncertain.

Attribute Diagnostic +2

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

no code implementations10 Dec 2024 Wufei Ma, Haoyu Chen, Guofeng Zhang, Yu-Cheng Chou, Celso M de Melo, Alan Yuille

We benchmark a wide range of open-sourced and proprietary LMMs, uncovering their limitations in various aspects of 3D awareness, such as height, orientation, location, and multi-object reasoning, as well as their degraded performance on images with uncommon camera viewpoints.

Autonomous Navigation Spatial Reasoning +1

Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

no code implementations18 Jul 2024 Wufei Ma, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Häne, Alan Yuille

In order to narrow the gap between video-text models and human performance on RCAD, we identify a key limitation of current contrastive approaches on video-text data and introduce LLM-teacher, a more effective approach to learn action semantics by leveraging knowledge obtained from a pretrained large language model.

Language Modelling Large Language Model +2

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

1 code implementation13 Jun 2024 Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille

A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e. g., class name and bounding box) and 3D information (e. g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images.

Image Captioning Linear Probing Object-Level 3D Awareness +2

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

1 code implementation2 Jun 2024 Xingrui Wang, Wufei Ma, Angtian Wang, Shuo Chen, Adam Kortylewski, Alan Yuille

To demonstrate the importance of an explicit 4D dynamics representation of the scenes in understanding world dynamics, we further propose NS-4Dynamics, a Neural-Symbolic model for reasoning on 4D Dynamics properties under explicit scene representation from videos.

counterfactual Counterfactual Reasoning +3

Uncertainty-Aware Deep Video Compression with Ensembles

no code implementations28 Mar 2024 Wufei Ma, Jiahao Li, Bin Li, Yan Lu

Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error.

Diversity Motion Estimation +2

3D-Aware Visual Question Answering about Parts, Poses and Occlusions

2 code implementations NeurIPS 2023 Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille

In this work, we introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes.

Question Answering Visual Question Answering

Generating Images with 3D Annotations Using Diffusion Models

no code implementations13 Jun 2023 Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically.

3D geometry 3D Pose Estimation +1

Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis

no code implementations31 May 2023 Angtian Wang, Wufei Ma, Alan Yuille, Adam Kortylewski

Human vision demonstrates higher robustness than current AI algorithms under out-of-distribution scenarios.

Robust Category-Level 3D Pose Estimation from Synthetic Data

no code implementations25 May 2023 Jiahao Yang, Wufei Ma, Angtian Wang, Xiaoding Yuan, Alan Yuille, Adam Kortylewski

In this work, we aim to narrow the performance gap between models trained on synthetic data and few real images and fully supervised models trained on large-scale data.

3D Pose Estimation 3D Reconstruction +4

NOVUM: Neural Object Volumes for Robust Object Classification

1 code implementation24 May 2023 Artur Jesslen, Guofeng Zhang, Angtian Wang, Wufei Ma, Alan Yuille, Adam Kortylewski

Discriminative models for object classification typically learn image-based representations that do not capture the compositional and 3D nature of objects.

Classification image-classification +4

OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

no code implementations29 Nov 2021 Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski

One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors.

3D Pose Estimation Benchmarking +6

Image-driven discriminative and generative machine learning algorithms for establishing microstructure-processing relationships

no code implementations27 Jul 2020 Wufei Ma, Elizabeth Kautz, Arun Baskaran, Aritra Chowdhury, Vineet Joshi, Bülent Yener, Daniel Lewis

A binary alloy (uranium-molybdenum) that is currently under development as a nuclear fuel was studied for the purpose of developing an improved machine learning approach to image recognition, characterization, and building predictive capabilities linking microstructure to processing conditions.

BIG-bench Machine Learning

An image-driven machine learning approach to kinetic modeling of a discontinuous precipitation reaction

no code implementations13 Jun 2019 Elizabeth Kautz, Wufei Ma, Saumyadeep Jana, Arun Devaraj, Vineet Joshi, Bülent Yener, Daniel Lewis

Here, we apply these well-established methods to develop an approach to microstructure quantification for kinetic modeling of a discontinuous precipitation reaction in a case study on the uranium-molybdenum system.

BIG-bench Machine Learning Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.