We present SyNoRiM, a novel way to jointly register multiple non-rigid shapes by synchronizing the maps relating learned functions defined on the point clouds.
We present CIRCLE, a framework for large-scale scene completion and geometric refinement based on local implicit signed distance functions.
Humans can naturally and effectively find salient regions in complex scenes.
Sampling equivariant networks can adjust sampling from input feature maps according to the transformation of the object, allowing a kernel to extract features of an object under different transformations.
Meshes with arbitrary connectivity can be remeshed to hold Loop subdivision sequence connectivity via self-parameterization, making SubdivNet a general approach.
In the first week of May, 2021, researchers from four different institutions: Google, Tsinghua University, Oxford University and Facebook, shared their latest work [16, 7, 12, 17] on arXiv. org almost at the same time, each proposing new learning architectures, consisting mainly of linear layers, claiming them to be comparable, or even superior to convolutional-based models.
Only query coordinates with high uncertainties are forwarded to the next level to a bigger neural network with a more powerful representational capability.
Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks.
Ranked #10 on Semantic Segmentation on Cityscapes val
We present MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds.
It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning.
Ranked #11 on 3D Part Segmentation on ShapeNet-Part (Instance Average IoU metric)
Previous online 3D dense reconstruction methods struggle to achieve the balance between memory storage and surface quality, largely due to the usage of stagnant underlying geometry representation, such as TSDF (truncated signed distance functions) or surfels, without any knowledge of the scene priors.
Experimental results show that Alt-ConvLSTM efficiently models the material kinetic features and greatly outperforms vanilla ConvLSTM with only the single state update.
We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects.
Particularly, we design a shallow-to-deep architecture on the basis of convolutional networks for semantic scene understanding and modeling.
3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community.
Ranked #4 on Point Cloud Completion on ShapeNet
Example-guided image synthesis aims to synthesize an image from a semantic label map and an exemplary image indicating style.
On the basis of the FaceShapeGene, a novel part-wise face image editing system is developed, which contains a shape-remix network and a conditional label-to-face transformer.
In this work, we propose a novel topic consisting of two dual tasks: 1) given a scene, recommend objects to insert, 2) given an object category, retrieve suitable background scenes.
We show that by using TZC, the braking distance can be shortened by 16% than ROS.
Since existing video datasets which have ground-truth foreground masks and optical flows are not sufficiently large, we propose a simple yet efficient method to build up a synthetic dataset supporting supervised training of the proposed adversarial network.
We present a data-driven approach to reconstructing high-resolution and detailed volumetric representations of 3D shapes.
We also combine our method with Mask R-CNN for instance segmentation, and demonstrated for the first time the ability of weakly supervised instance segmentation using only keyword annotations.
We evaluate our method on public portrait image datasets, and show that it outperforms other state-of-art general image completion methods.
Combining LineNet and TTLane, we proposed a pipeline to model HD maps with crowdsourced data for the first time.
We demonstrate that our pose-based framework can achieve better accuracy than the state-of-art detection-based approach on the human instance segmentation problem, and can moreover better handle occlusion.
Ranked #1 on Human Instance Segmentation on OCHuman
Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch.
In this paper, we investigate 6 popular blending algorithms---feather blending, multi-band blending, modified Poisson blending, mean value coordinate blending, multi-spline blending and convolution pyramid blending.
In this work we propose a fully automatic shadow region harmonization approach that improves the appearance compatibility of the de-shadowed region as typically produced by previous methods.
The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels; these are found by searching the Internet.