1 code implementation • 5 Sep 2024 • Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang
To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios.
no code implementations • 26 Jul 2024 • Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang
Recent advancements in 3D object reconstruction from single images have primarily focused on improving the accuracy of object shapes.
1 code implementation • CVPR 2024 • Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning.
Ranked #2 on Question Answering on SQA3D
no code implementations • 18 Apr 2024 • Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang
Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.
no code implementations • 28 Mar 2024 • Sirui Xu, Ziyin Wang, Yu-Xiong Wang, Liang-Yan Gui
However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions.
1 code implementation • NeurIPS 2023 • Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang
The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects.
no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell
Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.
1 code implementation • ICCV 2023 • Sirui Xu, Zhengyuan Li, Yu-Xiong Wang, Liang-Yan Gui
This paper addresses a novel task of anticipating 3D human-object interactions (HOIs).
no code implementations • 17 Aug 2023 • Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang, Liang-Yan Gui
To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt.
1 code implementation • 8 Jun 2023 • Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui
This paper aims to deal with the ignored real-world complexities in prior work on human motion forecasting, emphasizing the social properties of multi-person motion, the diversity of motion and social interactions, and the complexity of articulated motion.
no code implementations • 5 May 2023 • Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain.
1 code implementation • CVPR 2023 • Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang
Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain).
1 code implementation • 9 Feb 2023 • Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui
Predicting diverse human motions given a sequence of historical poses has received increasing attention.
Ranked #1 on Human Pose Forecasting on Human3.6M (MMADE metric)
no code implementations • CVPR 2023 • Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
We design a BEV-guided multi-sensor attention block to take queries from BEV embeddings and learn the BEV representation from sensor-specific features.
no code implementations • 12 Nov 2019 • Beatriz Quintino Ferreira, João P. Costeira, Ricardo G. Sousa, Liang-Yan Gui, João P. Gomes
We propose a compact framework with guided attention for multi-label classification in the fashion domain.
no code implementations • ECCV 2018 • Liang-Yan Gui, Yu-Xiong Wang, Deva Ramanan, Jose M. F. Moura
This paper addresses the problem of few-shot human motion prediction, in the spirit of the recent progress on few-shot learning and meta-learning.
no code implementations • ECCV 2018 • Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura
We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoder-decoder framework.