Search Results for author: Ziyu Guo

Found 25 papers, 16 papers with code

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

4 code implementations • 5 Apr 2024 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning.

Few-Shot Learning Scene Segmentation +1

435

Paper
Code

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

no code implementations • 21 Mar 2024 • Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li

To this end, we introduce MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs.

Math Mathematical Reasoning

Paper
Add Code

LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

no code implementations • 26 Feb 2024 • Kexin Chen, Yuyang Du, Tao You, Mobarakol Islam, Ziyu Guo, Yueming Jin, Guangyong Chen, Pheng-Ann Heng

We further design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of the old CL model.

Continual Learning Language Modelling +3

Paper
Add Code

SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning

no code implementations • 22 Jan 2024 • Hao Chen, Jiaze Wang, Ziyu Guo, Jinpeng Li, Donghao Zhou, Bian Wu, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng

Sign language recognition (SLR) plays a vital role in facilitating communication for the hearing-impaired community.

Contrastive Learning Language Modelling +3

Paper
Add Code

HIGT: Hierarchical Interaction Graph-Transformer for Whole Slide Image Analysis

1 code implementation • 14 Sep 2023 • Ziyu Guo, Weiqin Zhao, Shujun Wang, Lequan Yu

Considering that the information from different resolutions is complementary and can benefit each other during the learning process, we further design a novel Bidirectional Interaction block to establish communication between different levels within the WSI pyramids.

whole slide images

Paper
Code

ImageBind-LLM: Multi-modality Instruction Tuning

2 code implementations • 7 Sep 2023 • Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder.

Instruction Following Text Generation

5,485

Paper
Code

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

5 code implementations • 1 Sep 2023 • Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.

Ranked #5 on 3D Question Answering (3D-QA) on 3D MM-Vet

3D Generation 3D Question Answering (3D-QA) +4

377

Paper
Code

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

1 code implementation • 24 Aug 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao

However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes.

3D Semantic Segmentation Few-shot 3D semantic segmentation +1

Paper
Code

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

1 code implementation • 25 May 2023 • Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao

In this paper, we propose MUTR, a Multi-modal Unified Temporal transformer for Referring video object segmentation.

Ranked #1 on Referring Expression Segmentation on Referring Expressions for DAVIS 2016 & 2017

Object Referring Expression Segmentation +3

Paper
Code

Personalize Segment Anything Model with One Shot

1 code implementation • 4 May 2023 • Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.

Ranked #1 on Personalized Segmentation on PerSeg

Personalized Segmentation Segmentation +4

1,413

Paper
Code

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

2 code implementations • 14 Mar 2023 • Renrui Zhang, Liuhui Wang, Ziyu Guo, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi

We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions.

Ranked #1 on Training-free 3D Part Segmentation on ShapeNet-Part

3D Point Cloud Classification Training-free 3D Part Segmentation +1

435

Paper
Code

Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis

no code implementations • 1 Mar 2023 • Renrui Zhang, Liuhui Wang, Ziyu Guo, Jianbo Shi

Performances on standard 3D point cloud benchmarks have plateaued, resulting in oversized models and complex network design to make a fractional improvement.

3D Object Detection object-detection

Paper
Add Code

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training

no code implementations • 27 Feb 2023 • Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, Pheng-Ann Heng

In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.

Point Cloud Pre-training Representation Learning

Paper
Add Code

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

2 code implementations • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao

In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.

Ranked #2 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Classification 3D Object Detection +11

290

Paper
Code

Low-Cost Beamforming and DOA Estimation Based on One-Bit Reconfigurable Intelligent Surface

no code implementations • 15 Nov 2022 • Zihan Yang, Peng Chen, Ziyu Guo, Dahai Ni

In this work, we consider the Direction-of-Arrival (DOA) estimation problem in a low-cost architecture where only one antenna as the receiver is aided by a reconfigurable intelligent surface (RIS).

Paper
Add Code

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

1 code implementation • 28 Sep 2022 • Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui

Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification.

Ranked #4 on Training-free 3D Point Cloud Classification on ScanObjectNN (using extra training data)

Training-free 3D Point Cloud Classification Transfer Learning +1

Paper
Code

Can Language Understand Depth?

1 code implementation • 3 Jul 2022 • Renrui Zhang, Ziyao Zeng, Ziyu Guo, Yafeng Li

To our best knowledge, we are the first to conduct zero-shot adaptation from the semantic language knowledge to quantified downstream tasks and perform zero-shot monocular depth estimation.

Image Classification Monocular Depth Estimation

Paper
Code

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao

By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.

Ranked #4 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Object Detection 3D Point Cloud Linear Classification +5

197

Paper
Code

A RIS-Based Vehicle DOA Estimation Method With Integrated Sensing and Communication System

1 code implementation • 25 Apr 2022 • Zhimin Chen, Peng Chen, Ziyu Guo, Yudong Zhang, Xianbin Wang

A novel estimation method is proposed in the scenario with a receiver using only one full-functional channel, where multiple measurements for the DOA estimation are achieved by controlling the reflection matrix (measurement matrix) in the RIS.

Paper
Code

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

1 code implementation • ICCV 2023 • Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Xuanzhuo Xu, Ziteng Cui, Yu Qiao, Peng Gao, Hongsheng Li

In this paper, we introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.

Ranked #9 on 3D Object Detection From Monocular Images on KITTI-360

3D Object Detection From Monocular Images Autonomous Driving +3

308

Paper
Code

Reconfigurable Intelligent Surface Aided Sparse DOA Estimation Method With Non-ULA

no code implementations • 19 Mar 2022 • Peng Chen, Zihan Yang, Zhimin Chen, Ziyu Guo

The direction of arrival (DOA) estimation problem is addressed in this letter.

Paper
Add Code

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

no code implementations • 4 Dec 2021 • Longtian Qiu, Renrui Zhang, Ziyu Guo, Ziyao Zeng, Zilu Guo, Yafeng Li, Guangnan Zhang

Contrastive Language-Image Pre-training (CLIP) has drawn increasing attention recently for its transferable visual representation learning.

Language Modelling Representation Learning +1

Paper
Add Code

PointCLIP: Point Cloud Understanding by CLIP

2 code implementations • CVPR 2022 • Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

Ranked #3 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Open-Vocabulary Instance Segmentation Few-Shot Learning +6

290

Paper
Code

DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion

1 code implementation • 19 Nov 2021 • Renrui Zhang, Ziyao Zeng, Ziyu Guo, Xinben Gao, Kexue Fu, Jianbo Shi

We reverse the conventional design of applying convolution on voxels and attention to points.

Ranked #36 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification +3

Paper
Code

Improved Heatmap-based Landmark Detection

no code implementations • 12 Oct 2021 • Huifeng Yao, Ziyu Guo, Yatao Zhang, Xiaomeng Li

This paper proposes a landmark detection network for detecting sutures in endoscopic pictures, which solves the problem of a variable number of suture points in the images.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.