Drawing inspiration from voxel-based representations with the level of detail (LoD), we introduce a multi-scale tri-plane-based scene representation that is capable of capturing the LoD of the signed distance function (SDF) and the space radiance.
Existing trackers can be categorized into two association paradigms: single-feature paradigm (based on either motion or appearance feature) and serial paradigm (one feature serves as secondary while the other is primary).
Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization.
Ranked #1 on Text-to-Music Generation on MusicCaps
Unlike previous approaches that can only synthesize avatars based on simple text descriptions, our method enables the creation of personalized avatars from casually captured face or body images, while still supporting text-based model generation and editing.
Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph.
We present a novel differentiable rendering framework for joint geometry, material, and lighting estimation from multi-view images.
In this work, we formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints.
Heterogeneous graph neural network has unleashed great potential on graph representation learning and shown superior performance on downstream tasks such as node classification and clustering.
The first one is the Hessian regularization that smoothly diffuses the signed distance values to the entire distance field given noisy and incomplete input.
Typically, feature selection and embedding dimension search are optimized sequentially, i. e., feature selection is performed first, followed by embedding dimension search to determine the optimal dimension size for each selected feature.
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry.
The partial AUC, as a generalization of the AUC, summarizes only the TPRs over a specific range of the FPRs and is thus a more suitable performance measure in many real-world situations.
Based on FixMatch, where a pseudo label is generated from a weakly-augmented sample to teach the prediction on a strong augmentation of the same input sample, CLS allows the creation of both pseudo and complementary labels to support both positive and negative learning.
A general and robust POI embedding framework, the POI-Transformers, is initially proposed in this study to address these problems of POI entity matching.
In this paper, we present a model-based robust RL framework for autonomous greenhouse control to meet the sample efficiency and safety challenges.
In this work, we introduce a novel neural surface reconstruction framework that leverages the knowledge of stereo matching and feature consistency to optimize the implicit surface representation.
MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness.
However, the optimal control of autonomous greenhouses is challenging, requiring decision-making based on high-dimensional sensory data, and the scaling of production is limited by the scarcity of labor capable of handling this task.
Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics.
After propagating through a random amplifying medium, a squeezed state commonly shows excess noise above the shot-noise level.
In this paper, we show that the only solution of the vortex sheet equation, either stationary or uniformly rotating with negative angular velocity $\Omega$, such that it has positive vorticity and is concentrated in a finite disjoint union of smooth curves with finite length is the trivial one: constant vorticity amplitude supported on a union of nested, concentric circles.
Analysis of PDEs
The change complexity lies in the detailed scale of high granularity data, and in the geometric units used to simulate the change.
Computers and Society
Here we propose two alternatives to Black 76 to value European option future contracts in which the underlying market prices can be negative or mean reverting.
As such, the adverse influence of occluded pixels is suppressed in the cost fusion.
Ranked #1 on Point Clouds on DTU
Finally, a matchability-aware disparity refinement is introduced to improve the depth inference in weakly matchable regions.
Ranked #1 on Stereo Disparity Estimation on KITTI 2015
Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image.
This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors.
Partial Label Learning (PLL) aims to train a classifier when each training instance is associated with a set of candidate labels, among which only one is correct but is not accessible during the training phase.
Compared with other computer vision tasks, it is rather difficult to collect a large-scale MVS dataset as it requires expensive active scanners and labor-intensive process to obtain ground truth 3D structures.
The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data.
Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations.
However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes.
Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D reconstruction.
We present an end-to-end deep learning architecture for depth map inference from multi-view images.
Ranked #14 on Point Clouds on Tanks and Temples (Mean F1 (Intermediate) metric)
To address these problems, we proposed a framework which combines deep convolution generative adversarial network (DCGAN) with support vector machine (SVM) to deal with imbalance class problem and to improve pulsar identification accuracy.
To take advantage of the deep-learning method in detecting urban land-use patterns, we applied a transfer-learning-based remote-sensing image approach to extract and classify features.
In this study, we investigate several one-class classifiers, such as Presence and Background Learning (PBL), Positive Unlabeled Learning (PUL), OCSVM, BSVM and MAXENT, to extract urban impervious surface area using high spatial resolution imagery of GF-1, China's new generation of high spatial remote sensing satellite, and evaluate the classification accuracy based on artificial interpretation results.