1 code implementation • 29 Aug 2024 • Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Chengzhuo Tong, Peng Gao, Chunyuan Li, Pheng-Ann Heng
We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for zero-shot and promptable 3D segmentation.
1 code implementation • 5 Jun 2024 • Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao
Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions.
3 code implementations • CVPR 2024 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao
To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning.
5 code implementations • 1 Sep 2023 • Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.
Ranked #5 on
3D Question Answering (3D-QA)
on 3D MM-Vet
1 code implementation • 24 Aug 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao
However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes.
3D Semantic Segmentation
Few-shot 3D semantic segmentation
+1
no code implementations • 19 Aug 2023 • Xiangyang Zhu, Yiling Pan, Bailin Deng, Bin Wang
In this paper, we introduce a novel hybrid differentiable rendering method to efficiently reconstruct the 3D geometry and reflectance of a scene from multi-view images captured by conventional hand-held cameras.
1 code implementation • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao
The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks.
2 code implementations • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao
In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.
Ranked #2 on
3D Open-Vocabulary Instance Segmentation
on STPLS3D
1 code implementation • 3 Nov 2022 • Shihan Ma, Alexander Kenneth Clarke, Kostiantyn Maksymenko, Samuel Deslauriers-Gauthier, Xinjun Sheng, Xiangyang Zhu, Dario Farina
As a solution to this problem, we propose a transfer learning approach, in which a conditional generative model is trained to mimic the output of an advanced numerical model.
2 code implementations • 20 Jan 2022 • Xiangyang Zhu, Kede Ma, Wufeng Xue
Predicting cardiac indices has long been a focal point in the medical imaging community.
no code implementations • 13 Oct 2019 • Gang Chen, Hongzhe Yu, Wei Dong, Xinjun Sheng, Xiangyang Zhu, Han Ding
While training an end-to-end navigation network in the real world is usually of high cost, simulation provides a safe and cheap environment in this training stage.