1 code implementation • 23 Mar 2025 • Haoliang Shang, Hanyu Wu, Guangyao Zhai, Boyang Sun, Fangjinhua Wang, Federico Tombari, Marc Pollefeys
Scene graphs capture complex relationships among objects, serving as strong priors for content generation and manipulation.
no code implementations • 22 Mar 2025 • Shengyun Si, Xinpeng Wang, Guangyao Zhai, Nassir Navab, Barbara Plank
Recent advancements in large language models (LLMs) have demonstrated that fine-tuning and human alignment can render LLMs harmless.
no code implementations • 7 Mar 2025 • Ling Team, Binwei Zeng, Chao Huang, Chao Zhang, Changxin Tian, Cong Chen, dingnan jin, Feng Yu, Feng Zhu, Feng Yuan, Fakang Wang, Gangshan Wang, Guangyao Zhai, HaiTao Zhang, Huizhong Li, Jun Zhou, Jia Liu, Junpeng Fang, Junjie Ou, Jun Hu, Ji Luo, Ji Zhang, Jian Liu, Jian Sha, Jianxue Qian, Jiewei Wu, Junping Zhao, Jianguo Li, Jubao Feng, Jingchao Di, Junming Xu, Jinghua Yao, Kuan Xu, Kewei Du, Longfei Li, Lei Liang, Lu Yu, Li Tang, Lin Ju, Peng Xu, Qing Cui, Song Liu, Shicheng Li, Shun Song, Song Yan, Tengwei Cai, Tianyi Chen, Ting Guo, Ting Huang, Tao Feng, Tao Wu, Wei Wu, Xiaolu Zhang, Xueming Yang, Xin Zhao, Xiaobo Hu, Xin Lin, Yao Zhao, Yilong Wang, Yongzhen Guo, Yuanyuan Wang, Yue Yang, Yang Cao, Yuhao Fu, Yi Xiong, Yanzhe Li, Zhe Li, Zhiqiang Zhang, Ziqi Liu, ZhaoXin Huan, Zujie Wen, Zhenhang Sun, Zhuoxuan Du, Zhengyu He
Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models.
1 code implementation • 9 Feb 2025 • Zhifei Yang, Keyang Lu, Chao Zhang, Jiaxing Qi, Hanqi Jiang, Ruifei Ma, Shenglin Yin, Yifan Xu, Mingzhe Xing, Zhen Xiao, Jieyi Long, Guangyao Zhai
Controllable 3D scene generation has extensive applications in virtual reality and interior design, where the generated scenes should exhibit high levels of realism and controllability in terms of geometry.
1 code implementation • 30 Sep 2024 • Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp
The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis.
1 code implementation • 2 May 2024 • Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam
The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes.
no code implementations • 17 Mar 2024 • Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari
During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces.
1 code implementation • CVPR 2024 • Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao
Finally we deform the retrieved shape in the deformation module to tightly fit the input object by harnessing part center guided neural cage deformation.
no code implementations • CVPR 2024 • HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp, Guangyao Zhai, Hannah Schieber, Giulia Rizzoli, Pengyuan Wang, Hongcheng Zhao, Lorenzo Garattoni, Sven Meier, Daniel Roth, Nassir Navab, Benjamin Busam
Estimating 6D object poses is a major challenge in 3D computer vision.
1 code implementation • CVPR 2024 • Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam
Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information.
1 code implementation • 18 Nov 2023 • Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao
In this paper, we present ShapeMatcher, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation.
no code implementations • 21 Sep 2023 • Guangyao Zhai, Xiaoni Cai, Dianye Huang, Yan Di, Fabian Manhardt, Federico Tombari, Nassir Navab, Benjamin Busam
In this paper, we present SG-Bot, a novel rearrangement framework that utilizes a coarse-to-fine scheme with a scene graph as the scene representation.
no code implementations • 15 Aug 2023 • Yan Di, Chenyangguang Zhang, Pengyuan Wang, Guangyao Zhai, Ruida Zhang, Fabian Manhardt, Benjamin Busam, Xiangyang Ji, Federico Tombari
However, such strategies fail to consistently align the denoised point cloud with the given image, leading to unstable conditioning and inferior performance.
1 code implementation • NeurIPS 2023 • Guangyao Zhai, Evin Pınar Örnek, Shun-Cheng Wu, Yan Di, Federico Tombari, Nassir Navab, Benjamin Busam
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
1 code implementation • CVPR 2023 • HyunJun Jung, Patrick Ruhkamp, Guangyao Zhai, Nikolas Brasch, Yitong Li, Yannick Verdie, Jifei Song, Yiren Zhou, Anil Armagan, Slobodan Ilic, Ales Leonardis, Nassir Navab, Benjamin Busam
Learning-based methods to solve dense 3D vision problems typically train on 3D sensor data.
no code implementations • CVPR 2023 • Dekai Zhu, Guangyao Zhai, Yan Di, Fabian Manhardt, Hendrik Berkemeyer, Tuan Tran, Nassir Navab, Federico Tombari, Benjamin Busam
Reliable multi-agent trajectory prediction is crucial for the safe planning and control of autonomous systems.
1 code implementation • 20 Dec 2022 • HyunJun Jung, Guangyao Zhai, Shun-Cheng Wu, Patrick Ruhkamp, Hannah Schieber, Giulia Rizzoli, Pengyuan Wang, Hongcheng Zhao, Lorenzo Garattoni, Sven Meier, Daniel Roth, Nassir Navab, Benjamin Busam
Estimating 6D object poses is a major challenge in 3D computer vision.
no code implementations • 2 Nov 2022 • Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari
Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box.
no code implementations • 26 Sep 2022 • Guangyao Zhai, Dianye Huang, Shun-Cheng Wu, HyunJun Jung, Yan Di, Fabian Manhardt, Federico Tombari, Nassir Navab, Benjamin Busam
6-DoF robotic grasping is a long-lasting but unsolved problem.
no code implementations • 31 Jul 2022 • Guangyao Zhai, Yu Zheng, Ziwei Xu, Xin Kong, Yong liu, Benjamin Busam, Yi Ren, Nassir Navab, Zhengyou Zhang
In this paper, we introduce DA$^2$, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects.
no code implementations • 9 May 2022 • HyunJun Jung, Patrick Ruhkamp, Guangyao Zhai, Nikolas Brasch, Yitong Li, Yannick Verdie, Jifei Song, Yiren Zhou, Anil Armagan, Slobodan Ilic, Ales Leonardis, Benjamin Busam
Depth estimation is a core task in 3D computer vision.
no code implementations • 14 Dec 2020 • Guangyao Zhai, Xin Kong, Jinhao Cui, Yong liu, Zhen Yang
Most end-to-end Multi-Object Tracking (MOT) methods face the problems of low accuracy and poor generalization ability.
1 code implementation • 26 Aug 2020 • Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong liu, Wanlong Li, Feng Wen
First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud.
no code implementations • 4 Sep 2019 • Xin Kong, Guangyao Zhai, Baoquan Zhong, Yong liu
In this paper, we propose PASS3D to achieve point-wise semantic segmentation for 3D point cloud.
no code implementations • 19 Jun 2019 • Guangyao Zhai, Liang Liu, Linjian Zhang, Yong liu
The feature-encoding module encodes the short-term motion feature in an image pair, while the memory-propagating module captures the long-term motion feature in the consecutive image pairs.