Zero-shot 3D classification

9 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Most implemented papers

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

qizekun/ShapeLLM 27 Feb 2024

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

yangyangyang127/pointclip_v2 ICCV 2023

In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.

Uni3D: Exploring Unified 3D Representation at Scale

baaivision/uni3d 10 Oct 2023

Scaling up representations for images or text has been extensively investigated in the past few years and has led to revolutions in learning vision and language.

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

salesforce/ulip CVPR 2023

Then, ULIP learns a 3D representation space aligned with the common image-text space, using a small number of automatically synthesized triplets.

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

salesforce/ulip 14 May 2023

It achieves a new SOTA of 50. 6% (top-1) on Objaverse-LVIS and 84. 7% (top-1) on ModelNet40 in zero-shot classification.

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

Colin97/OpenShape_code NeurIPS 2023

Due to their alignment with CLIP embeddings, our learned shape representations can also be integrated with off-the-shelf CLIP-based models for various applications, such as point cloud captioning and point cloud-conditioned image generation.

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

mr-neko/jm3d 6 Aug 2023

Insufficient synergy neglects the idea that a robust 3D representation should align with the joint vision-language space, rather than independently aligning with each modality.

ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights

TencentARC/ViT-Lens 20 Aug 2023

A well-trained lens with a ViT backbone has the potential to serve as one of these foundation models, supervising the learning of subsequent modalities.

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

ucsc-vlaa/mixcon3d 3 Nov 2023

Contrastive learning has emerged as a promising paradigm for 3D open-world understanding, i. e., aligning point cloud representation to image and text embedding space individually.