Search Results for author: Liang-Yan Gui

Found 17 papers, 7 papers with code

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

1 code implementation5 Sep 2024 Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang

To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios.

Scene Understanding Visual Grounding

Floating No More: Object-Ground Reconstruction from a Single Image

no code implementations26 Jul 2024 Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang

Recent advancements in 3D object reconstruction from single images have primarily focused on improving the accuracy of object shapes.

3D Object Reconstruction 3D Reconstruction +1

Situational Awareness Matters in 3D Vision Language Reasoning

1 code implementation CVPR 2024 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning.

Question Answering

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

no code implementations18 Apr 2024 Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.

Segmentation

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

no code implementations28 Mar 2024 Sirui Xu, Ziyin Wang, Yu-Xiong Wang, Liang-Yan Gui

However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions.

Human-Object Interaction Detection Language Modelling +2

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

1 code implementation NeurIPS 2023 Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects.

Object object-detection +2

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations25 Sep 2023 Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

no code implementations17 Aug 2023 Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang, Liang-Yan Gui

To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt.

Edge-computing Instance Segmentation +5

Stochastic Multi-Person 3D Motion Forecasting

1 code implementation8 Jun 2023 Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui

This paper aims to deal with the ignored real-world complexities in prior work on human motion forecasting, emphasizing the social properties of multi-person motion, the diversity of motion and social interactions, and the complexity of articulated motion.

Diversity Motion Forecasting +1

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

no code implementations5 May 2023 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain.

Domain Adaptation

Contrastive Mean Teacher for Domain Adaptive Object Detectors

1 code implementation CVPR 2023 Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain).

Contrastive Learning Object +4

Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors

1 code implementation9 Feb 2023 Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui

Predicting diverse human motions given a sequence of historical poses has received increasing attention.

 Ranked #1 on Human Pose Forecasting on Human3.6M (MMADE metric)

Diversity Human Pose Forecasting +2

BEV-Guided Multi-Modality Fusion for Driving Perception

no code implementations CVPR 2023 Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

We design a BEV-guided multi-sensor attention block to take queries from BEV embeddings and learn the BEV representation from sensor-specific features.

Autonomous Driving Representation Learning

Few-Shot Human Motion Prediction via Meta-Learning

no code implementations ECCV 2018 Liang-Yan Gui, Yu-Xiong Wang, Deva Ramanan, Jose M. F. Moura

This paper addresses the problem of few-shot human motion prediction, in the spirit of the recent progress on few-shot learning and meta-learning.

Few-Shot Learning Human motion prediction +1

Adversarial Geometry-Aware Human Motion Prediction

no code implementations ECCV 2018 Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura

We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoder-decoder framework.

Decoder Human motion prediction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.