Open-world 3D part segmentation is pivotal in diverse applications such as robotics and AR/VR.
Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts.
We report Zero123++, an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view.
Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution.
Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world.
Due to their alignment with CLIP embeddings, our learned shape representations can also be integrated with off-the-shelf CLIP-based models for various applications, such as point cloud captioning and point cloud-conditioned image generation.
Ranked #5 on Zero-Shot Transfer 3D Point Cloud Classification on ModelNet40 (using extra training data)
Generalizable 3D part segmentation is important but challenging in vision and robotics.
We study how choices of input point cloud coordinate frames impact learning of manipulation skills from 3D point clouds.
Our method co-designs an efficient labeling process with semi/weakly supervised learning and is applicable to nearly any 3D semantic segmentation backbones.
Approximate convex decomposition aims to decompose a 3D shape into a set of almost convex components, whose convex hulls can then be used to represent the input shape.
Given a collection of 3D meshes of a category and their deformation handles (control points), our method learns a set of meta-handles for each shape, which are represented as combinations of the given handles.
To achieve this task, a simulated environment with physically realistic simulation, sufficient articulated objects, and transferability to the real robot is indispensable.
3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community.
Ranked #5 on Point Cloud Completion on ShapeNet
Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data.
To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards.