We instead propose to decompose shapes using a library of 3D parts provided by the user, giving full control over the choice of parts.
In this paper, we present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur).
Ranked #6 on Action Spotting on SoccerNet-v2
We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map.
Approaches for single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry.
We show that, in contrast to previous image-based methods, the use of a geometric representation of 3D shape and 2D strokes allows the model to transfer important aspects of shape and texture style while preserving contours.
This leads to poor accuracy when downstream tasks, such as action recognition, depend on pose.
The novel graph constructor maps a glyph's latent code to its graph representation that matches expert knowledge, which is trained to help the translation task.
Artists and video game designers often construct 2D animations using libraries of sprites -- textured patches of objects and characters.
Our model generates novel poses based on keypoint locations, which can be modified in real time while providing interactive feedback, allowing for intuitive reposing and animation.
In particular, retrieval results by our network better match human judgement of structural layout similarity compared to both IoUs and other baselines including a state-of-the-art method based on graph neural networks and image convolution.
We introduce COALESCE, the first data-driven framework for component-based shape assembly which employs deep learning to synthesize part connections.
We propose a novel neural architecture for representing 3D surfaces, which harnesses two complementary shape representations: (i) an explicit representation via an atlas, i. e., embeddings of 2D domains into 3D; (ii) an implicit-function representation, i. e., a scalar function over the 3D volume, with its levels denoting surfaces.
People often create art by following an artistic workflow involving multiple stages that inform the overall design.
Recently there has been an interest in the potential of learning generative models from a single image, as opposed to from a large dataset.
We introduce UprightNet, a learning-based approach for estimating 2DoF camera orientation from a single RGB image of an indoor scene.
We propose to represent shapes as the deformation and combination of learnable elementary 3D structures, which are primitives resulting from training over a collection of shape.
Ranked #7 on 3D Dense Shape Correspondence on SHREC'19 (using extra training data)
Many tasks in graphics and vision demand machinery for converting shapes into consistent representations with sparse sets of parameters; these representations facilitate rendering, editing, and storage.
The unsupervised BAE-NET is trained with a collection of un-segmented shapes, using a shape reconstruction loss, without any ground-truth labels.
In this paper, we address the problem of 3D object mesh reconstruction from RGB videos.
By predicting this feature for a new shape, we implicitly predict correspondences between this shape and the template.
Ranked #8 on 3D Dense Shape Correspondence on SHREC'19 (using extra training data)
We introduce a method for learning to generate the surface of 3D shapes.
Ranked #5 on Point Cloud Completion on Completion3D
In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface.
Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes.
To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions.
Ranked #2 on 3D Reconstruction on Scan2CAD