Search Results for author: Ziyao Zeng

Found 7 papers, 5 papers with code

WorDepth: Variational Language Prior for Monocular Depth Estimation

1 code implementation • 4 Apr 2024 • Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Yangchao Wu, Stefano Soatto, Byung-Woo Hong, Dong Lao, Alex Wong

To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene.

3D Reconstruction Monocular Depth Estimation

Paper
Code

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

no code implementations • 31 Jan 2024 • Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong

We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound.

Question Answering Visual Question Answering (VQA)

Paper
Add Code

iQuery: Instruments as Queries for Audio-Visual Sound Separation

1 code implementation • CVPR 2023 • Jiaben Chen, Renrui Zhang, Dongze Lian, Jiaqi Yang, Ziyao Zeng, Jianbo Shi

To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert an additional query as an audio prompt while freezing the attention mechanism.

Disentanglement

Paper
Code

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

2 code implementations • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao

In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.

Ranked #2 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Classification 3D Object Detection +11

290

Paper
Code

Can Language Understand Depth?

1 code implementation • 3 Jul 2022 • Renrui Zhang, Ziyao Zeng, Ziyu Guo, Yafeng Li

To our best knowledge, we are the first to conduct zero-shot adaptation from the semantic language knowledge to quantified downstream tasks and perform zero-shot monocular depth estimation.

Image Classification Monocular Depth Estimation