PVO models visual odometry (VO) and video panoptic segmentation (VPS) in a unified view, enabling the two tasks to facilitate each other.
Even if the plane parameters are involved in the optimization, we effectively simplify the back-end map by using planar structures.
Based on the Manhattan-world assumption, planar constraints are employed to regularize the geometry in floor and wall regions predicted by a 2D semantic segmentation network.
We, as human beings, can understand and picture a familiar scene from arbitrary viewpoints given a single image, whereas this is still a grand challenge for computers.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover the object poses.
Ranked #1 on 6D Pose Estimation using RGB on LineMOD
Existing methods are mainly based on the trained instance embedding to maintain consistent panoptic segmentation.
The correct ego-motion estimation basically relies on the understanding of correspondences between adjacent LiDAR scans.
We describe an approach to large-scale indoor place recognition that aggregates low-level colour and geometric features with high-level semantic features.
Standard visual localization methods build a priori 3D model of a scene which is used to establish correspondences against the 2D keypoints in a query image.
Smart weeding systems to perform plant-specific operations can contribute to the sustainability of agriculture and the environment.
In this work, we propose a novel neural implicit representation for the human body, which is fully differentiable and optimizable with disentangled shape and pose latent spaces.
In this paper, we present a novel neural scene rendering system, which learns an object-compositional neural radiance field and produces realistic rendering with editing capability for a clustered and real-world scene.
To address this problem, we propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.
However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks.
In this work, we propose a novel system for integrated 3D object detection and tracking, which uses a dynamic object occupancy map and previous object states as spatial-temporal memory to assist object detection in future frames.
Different from traditional video cameras, event cameras capture asynchronous events stream in which each event encodes pixel location, trigger time, and the polarity of the brightness changes.
Here, we presented two-photon photoluminescence (TPPL) measurements on individual Au nanobipyramids (AuNP) to reveal their ultrafast dynamics by two-pulse excitation on a global time scale ranging from sub-femtosecond to tens of picoseconds.
Optics Mesoscale and Nanoscale Physics Quantum Physics
To suit our network to self-supervised learning, we design several novel loss functions that utilize the inherent properties of LiDAR point clouds.
The proposed mesh generation module incrementally fuses each estimated keyframe depth map to an online dense surface mesh, which is useful for achieving realistic AR effects such as occlusions and collisions.
Most of existing methods directly train a network to learn a mapping from sparse depth inputs to dense depth maps, which has difficulties in utilizing the 3D geometric constraints and handling the practical sensor noises.
In this paper, we present a quantum singular value decomposition algorithm for third-order tensors inspired by the classical algorithm of tensor singular value decomposition (t-svd) and then extend it to order-$p$ tensors.
However, jointly using visual and inertial measurements to optimize SLAM objective functions is a problem of high computational complexity.
In this paper, we present RKD-SLAM, a robust keyframe-based dense SLAM approach for an RGB-D camera that can robustly handle fast motion and dense loop closure, and run without time limitation in a moderate size scene.
Our framework consists of steps of solving the feature `dropout' problem when indistinctive structures, noise or large image distortion exists, and of rapidly recognizing and joining common features located in different subsequences.