In particular, we propose RayMVSNet which learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth.
To resolve the sample efficiency issue in learning the high-dimensional and complex control of dexterous grasping, we propose an effective representation of grasping state characterizing the spatial interaction between the gripper and the target object.
However, adopting relations between all the object or patch proposals for detection is inefficient, and an imbalanced combination of local and global relations brings extra noise that could mislead the training.
In the field of 3D object detection, previous methods have been taking the advantage of context encoding, graph embedding, or explicit relation reasoning to extract relation context.
Context has proven to be one of the most important factors in object layout reasoning for 3D scene understanding.
Such sparse and loose matching requires contextual features capturing the geometric structure of the point clouds.
Each level of the tree corresponds to an assembly of shape parts, represented as implicit functions, to reconstruct the input shape.
In particular, we impose a consistency regularization which enforces the outputs from each of the multiple layers to be consistent for the input image and its perturbed counterpart.
In this paper, we introduce a neural architecture, termed Box2Seg, to learn point-level semantics of 3D point clouds with bounding box-level supervision.
Weakly supervised learning can help local feature methods to overcome the obstacle of acquiring a large-scale dataset with densely labeled correspondences.
Ranked #1 on Camera Localization on Aachen Day-Night benchmark
In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy.
These approaches, however, are limited in their ability to capture the underlying neural dynamics (e. g. linear) and in their ability to relate the learned dynamics back to the observed behaviour (e. g. no time lag).
PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL).
Seeing as a systematic outlier is a combination of patterns of a clean instance and systematic error patterns, our main insight is that inliers can be modelled by a smaller representation (subspace) in a model than outliers.
As such, estimating density ratios accurately using only samples from $p$ and $q$ is of high significance and has led to a flurry of recent work in this direction.
We propose an efficient plug-and-play acceleration framework for semi-supervised video object segmentation by exploiting the temporal redundancies in videos presented by the compressed bitstream.
Traffic simulators act as an essential component in the operating and planning of transportation systems.
We propose to tackle the difficulties of fast-motion camera tracking in the absence of inertial measurements using random optimization, in particular, the Particle Filter Optimization (PFO).
In this work, rather than defining a continuous or discrete kernel, we directly embed convolutional kernels into the learnable potential fields, giving rise to potential convolution.
Learning-based 3D shape segmentation is usually formulated as a semantic labeling problem, assuming that all parts of training shapes are annotated with a given set of tags.
According to the theory of geometric stability analysis, a minimal set of three planar/cylindrical patches are geometrically stable and determine the full 6DoFs of the object pose.
In this paper, we propose a selective sensing framework that adopts the novel concept of data-driven nonuniform subsampling to reduce the dimensionality of acquired signals while retaining the information of interest in a computation-free fashion.
As such, learning the laws is then reduced to symbolic regression and Bayesian inference methods are used to obtain the distribution of unobserved properties.
The masses of tetraquark states of all $qc\bar q \bar c$ and $cc\bar c \bar c$ quark configurations are evaluated in a constituent quark model, where the Cornell-like potential and one-gluon exchange spin-spin coupling are employed.
High Energy Physics - Phenomenology
We show that HPC constitutes a powerful point feature learning with a rather compact set of only four types of geometric priors as kernels.
We present a novel attention-based mechanism for learning enhanced point features for tasks such as point cloud classification and segmentation.
The succeed of simulation strongly supports the ellipse packing hypothesis that was proposed to explain the dynamic behaviors of a trivalent 2D structure.
Biological Physics Adaptation and Self-Organizing Systems Cell Behavior
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors.
The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures.
Distributed, Parallel, and Cluster Computing
We propose an end-to-end deep neural network which is able to predict both reflectional and rotational symmetries of 3D objects present in the input RGB-D image.
We introduce an end-to-end learnable technique to robustly identify feature edges in 3D point cloud data.
Adding the attention module with a rectified linear unit (ReLU) results in an amplification of positive elements and a suppression of negative ones, both with learned, data-adaptive parameters.
Online semantic 3D segmentation in company with real-time RGB-D reconstruction poses special challenges such as how to perform 3D convolution directly over the progressively fused 3D geometric data, and how to smartly fuse information from frame to frame.
Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size.
Since DynamicPPL is a modular, stand-alone library, any probabilistic programming system written in Julia, such as Turing. jl, can use DynamicPPL to specify models and trace their model parameters.
To tackle intra-class shape variations, we learn canonical shape space (CASS), a unified representation for a large variety of instances of a certain object category.
Based on further studying the low-rank subspace clustering (LRSC) and L2-graph subspace clustering algorithms, we propose a F-graph subspace clustering algorithm with a symmetric constraint (FSSC), which constructs a new objective function with a symmetric constraint basing on F-norm, whose the most significant advantage is to obtain a closed-form solution of the coefficient matrix.
The regularization maximizes the mutual information between navigation actions and visual observation transforms of an agent, thus promoting more informed navigation decisions.
To address this issue, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression, and pose estimation are performed separately.
We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly.
Stan's Hamilton Monte Carlo (HMC) has demonstrated remarkable sampling robustness and efficiency in a wide range of Bayesian inference problems through carefully crafted adaption schemes to the celebrated No-U-Turn sampler (NUTS) algorithm.
Transforming one probability distribution to another is a powerful tool in Bayesian inference and machine learning.
In depth-sensing applications ranging from home robotics to AR/VR, it will be common to acquire 3D scans of interior spaces repeatedly at sparse time intervals (e. g., as part of regular daily use).
No-cloning theorem forbids perfect cloning of an unknown quantum state.
Exhaustive experiments indicate that the proposed method can detect building change types directly and outperform the current multi-index learning method.
While the current convolution neural network tends to extract global features and global semantic information in a scene, the geo-spatial objects can be located at anywhere in an aerial image scene and their spatial arrangement tends to be more complicated.
In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene.
First, the latent distribution is conditioned on current observations and the target view, leading to a model-based, target-driven navigation.
Enlightened by the fact that 3D shape structure is characterized as part composition and placement, we propose to model 3D shape variations with a part-aware deep generative network, coined as PAGENet.
Determining the positions of neurons in an extracellular recording is useful for investigating functional properties of the underlying neural circuitry.
In this paper, we propose to re-examine the RL approaches through the lens of classic transportation theory.
Increasingly available city data and advanced learning techniques have empowered people to improve the efficiency of our city functions.
To enable cooperation of traffic signals, in this paper, we propose a model, CoLight, which uses graph attentional networks to facilitate communication.
Specifically, the temporal coherence branch pretrained in an adversarial fashion from unlabeled video data, is designed to capture the dynamic appearance and motion cues of video sequences to guide object segmentation.
Ranked #2 on Semi-Supervised Video Object Segmentation on YouTube
While the part prior network can be trained with noisy and inconsistently segmented shapes, the final output of AdaCoSeg is a consistent part labeling for the input set, with each shape segmented into up to (a user-specified) K parts.
For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input.
Meanwhile, to increase the segmentation accuracy at each node, we enhance the recursive contextual feature with the shape feature extracted for the corresponding part.
Ranked #10 on 3D Part Segmentation on ShapeNet-Part
The network may significantly alter the geometry and structure of the input parts and synthesize a novel shape structure based on the inputs, while adding or removing parts to minimize a structure plausibility loss.
We propose to generate part hypotheses from the components based on a hierarchical grouping strategy, and perform labeling on those part groups instead of directly on the components.
Multi-view deep neural network is perhaps the most successful approach in 3D shape classification.
The reason is that it finds the similar instances according to their features directly, which is usually impacted by the imperfect data, and thus returns sub-optimal results.
In this network, a Score Generation Unit is devised to evaluate the quality of each projected image with score vectors.
We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently.
We propose a scalable Laplacian pyramid reconstructive adversarial network (LAPRAN) that enables high-fidelity, flexible and fast CS images reconstruction.
In this work, we take their insight of using kernels as fixed adversaries further and present a novel method for training deep generative models that does not involve saddlepoint optimization.
We present a semi-supervised co-analysis method for learning 3D shape styles from projected feature lines, achieving style patch localization with only weak supervision.
We propose to recover 3D shape structures from single RGB images, where structure refers to shape parts represented by cuboids and part relations encompassing connectivity and symmetry.
We introduce a novel RGB-D patch descriptor designed for detecting coplanar surfaces in SLAM reconstruction.
In this paper, we propose LCANet, an end-to-end deep neural network based lipreading system.
Ranked #1 on Lipreading on GRID corpus (mixed-speech)
Interpreting black box classifiers, such as deep networks, allows an analyst to validate a classifier before it is deployed in a high-stakes setting.
We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures.
For processing static data in large batch sizes, the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9. 5x higher energy efficiency.
Compressive sensing (CS) is a promising technology for realizing energy-efficient wireless sensors for long-term health monitoring.
This paper addresses the real-time encoding-decoding problem for high-frame-rate video compressive sensing (CS).
Active vision is inherently attention-driven: The agent actively selects views to attend in order to fast achieve the vision task while improving its internal representation of the scene being observed.