The method learns, in an end-to-end fashion, a soft partition of a given category-specific 3D template mesh into rigid parts together with a monocular reconstruction network that predicts the part motions such that they reproject correctly onto 2D DensePose-like surface annotations of the object.
Inspired by these advances in geometric understanding, we aim to imbue image-based perception with representations learned under geometric constraints.
The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e. g. point clouds) are notoriously hard.
Using a set of high-quality sparse keypoint matches, we optimize over the per-frame linear combinations of depth planes and camera poses to form a geometrically consistent cloud of keypoints.
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images.
In this paper, we address the problem of reducing the memory footprint of convolutional network architectures.
We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time.
Iterated-integral signatures and log signatures are vectors calculated from a path that characterise its shape.
Data Structures and Algorithms Mathematical Software Rings and Algebras
1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas
We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.
Convolutional network are the de-facto standard for analysing spatio-temporal data such as images, videos, 3D shapes, etc.
Ranked #20 on 3D Part Segmentation on ShapeNet-Part (Instance Average IoU metric)
Deep convolutional neural networks have become the gold standard for image recognition tasks, demonstrating many current state-of-the-art results and even achieving near-human level performance on some tasks.
However, if you simply alternate convolutional layers with max-pooling layers, performance is limited due to the rapid reduction in spatial size, and the disjoint nature of the pooling regions.
Ranked #16 on Image Classification on MNIST
Convolutional neural networks (CNNs) perform well on problems such as handwriting recognition and image classification.
Ranked #108 on Image Classification on CIFAR-100
We show that the path signature, used as a set of features for consumption by a convolutional neural network (CNN), improves the accuracy of online character recognition---that is the task of reading characters represented as a collection of paths.