Search Results for author: Guangyao Zhai

Found 25 papers, 11 papers with code

SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation

1 code implementation23 Mar 2025 Haoliang Shang, Hanyu Wu, Guangyao Zhai, Boyang Sun, Fangjinhua Wang, Federico Tombari, Marc Pollefeys

Scene graphs capture complex relationships among objects, serving as strong priors for content generation and manipulation.

Scene Generation

Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior

no code implementations22 Mar 2025 Shengyun Si, Xinpeng Wang, Guangyao Zhai, Nassir Navab, Barbara Plank

Recent advancements in large language models (LLMs) have demonstrated that fine-tuning and human alignment can render LLMs harmless.

MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

1 code implementation9 Feb 2025 Zhifei Yang, Keyang Lu, Chao Zhang, Jiaxing Qi, Hanqi Jiang, Ruifei Ma, Shenglin Yin, Yifan Xu, Mingzhe Xing, Zhen Xiao, Jieyi Long, Guangyao Zhai

Controllable 3D scene generation has extensive applications in virtual reality and interior design, where the generated scenes should exhibit high levels of realism and controllability in terms of geometry.

Scene Generation

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

1 code implementation30 Sep 2024 Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp

The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis.

EgoSchema Language Modelling +5

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

1 code implementation2 May 2024 Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam

The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes.

3D Object Retrieval Denoising +2

GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

no code implementations17 Mar 2024 Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces.

Novel View Synthesis

ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation

1 code implementation CVPR 2024 Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao

Finally we deform the retrieved shape in the deformation module to tightly fit the input object by harnessing part center guided neural cage deformation.

Object Retrieval +2

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

1 code implementation CVPR 2024 Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam

Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information.

Object Pose Estimation

ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

1 code implementation18 Nov 2023 Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao

In this paper, we present ShapeMatcher, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation.

Object Retrieval +2

SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs

no code implementations21 Sep 2023 Guangyao Zhai, Xiaoni Cai, Dianye Huang, Yan Di, Fabian Manhardt, Federico Tombari, Nassir Navab, Benjamin Busam

In this paper, we present SG-Bot, a novel rearrangement framework that utilizes a coarse-to-fine scheme with a scene graph as the scene representation.

Object Rearrangement

CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D Reconstruction

no code implementations15 Aug 2023 Yan Di, Chenyangguang Zhang, Pengyuan Wang, Guangyao Zhai, Ruida Zhang, Fabian Manhardt, Benjamin Busam, Xiangyang Ji, Federico Tombari

However, such strategies fail to consistently align the denoised point cloud with the given image, leading to unstable conditioning and inferior performance.

3D Reconstruction

OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection

no code implementations2 Nov 2022 Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari

Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box.

Monocular 3D Object Detection Object +1

DA$^2$ Dataset: Toward Dexterity-Aware Dual-Arm Grasping

no code implementations31 Jul 2022 Guangyao Zhai, Yu Zheng, Ziwei Xu, Xin Kong, Yong liu, Benjamin Busam, Yi Ren, Nassir Navab, Zhengyou Zhang

In this paper, we introduce DA$^2$, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects.

FlowMOT: 3D Multi-Object Tracking by Scene Flow Association

no code implementations14 Dec 2020 Guangyao Zhai, Xin Kong, Jinhao Cui, Yong liu, Zhen Yang

Most end-to-end Multi-Object Tracking (MOT) methods face the problems of low accuracy and poor generalization ability.

3D Multi-Object Tracking motion prediction +1

Semantic Graph Based Place Recognition for 3D Point Clouds

1 code implementation26 Aug 2020 Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong liu, Wanlong Li, Feng Wen

First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud.

Graph Matching Graph Similarity

PoseConvGRU: A Monocular Approach for Visual Ego-motion Estimation by Learning

no code implementations19 Jun 2019 Guangyao Zhai, Liang Liu, Linjian Zhang, Yong liu

The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs.

Camera Calibration Motion Estimation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.