We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets.
Deep learning based localization and mapping approaches have recently emerged as a new research direction and receive significant attentions from both industry and academia.
A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes.
We have conducted extensive real-world experiments in a construction site showing significant accuracy improvement via cross-modality training and the use of social forces.
In this paper, we propose a novel positioning system, RAVEL (Radio And Vision Enhanced Localization), which fuses anonymous visual detections captured by widely available camera infrastructure, with radio readings (e. g. WiFi radio data).
Supervised approaches typically require the annotation of large training sets; there has thus been great interest in leveraging weakly, semi- or self-supervised methods to avoid this, with much success.
In this paper, we show how to use a combination of three techniques to allow the existing photometric losses to work for both day and nighttime images.
We present RangeUDF, a new implicit representation based framework to recover the geometry and semantics of continuous 3D scene surfaces from point clouds.
In this work, we propose an almost-universal sampler, in our quest for a sampler that can learn to preserve the most useful points for a particular task, yet be inexpensive to adapt to different tasks, models, or datasets.
Scene flow is a powerful tool for capturing the motion field of 3D point clouds.
Monocular approaches to such tasks exist, and dense monocular mapping approaches have been successfully deployed for UAV applications.
Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset.
We present a pose adaptive few-shot learning procedure and a two-stage data interpolation regularization, termed Pose Adaptive Dual Mixup (PADMix), for single-image 3D reconstruction.
To demonstrate the utility of our approach we have collected IQ (In-phase and Quadrature components) samples from a four-element Universal Linear Array (ULA) in various Light-of-Sight (LOS) and Non-Line-of-Sight (NLOS) environments, and published the dataset.
To avoid the drawbacks of conventional DFT pre-processing, we propose a learnable pre-processing module, named CubeLearn, to directly extract features from raw radar signal and build an end-to-end deep neural network for mmWave FMCW radar motion recognition applications.
We study the problem of efficient semantic segmentation of large-scale 3D point clouds.
Specifically, SoundDet consists of a backbone neural network and two parallel heads for temporal detection and spatial localization, respectively.
Simultaneous Localization and Mapping (SLAM) system typically employ vision-based sensors to observe the surrounding environment.
Labelling point clouds fully is highly time-consuming and costly.
There is considerable work in the field of deep camera relocalization, which directly estimates poses from raw images.
Accurately describing and detecting 2D and 3D keypoints is crucial to establishing correspondences across images and point clouds.
In this demonstration, we present a real-time indoor positioning system which fuses millimetre-wave (mmWave) radar and IMU data via deep sensor fusion.
An essential prerequisite for unleashing the potential of supervised deep learning algorithms in the area of 3D scene understanding is the availability of large-scale and richly annotated datasets.
Deep learning based localization and mapping has recently attracted significant attention.
In this work, we propose a VAE-LSTM hybrid model as an unsupervised approach for anomaly detection in time series.
We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality.
In this paper, we present a novel end-to-end learning-based LiDAR relocalization framework, termed PointLoc, which infers 6-DoF poses directly using only a single point cloud as input, without requiring a pre-built map.
Modern inertial measurements units (IMUs) are small, cheap, energy efficient, and widely employed in smart devices and mobile robots.
By integrating the observations from different sensors, these mobile agents are able to perceive the environment and estimate system states, e. g. locations and orientations.
Demand for smartwatches has taken off in recent years with new models which can run independently from smartphones and provide more useful features, becoming first-class mobile platforms.
We study the problem of efficient semantic segmentation for large-scale 3D point clouds.
Ranked #3 on Semantic Segmentation on Toronto-3D L002
In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation.
This paper presents the design, implementation and evaluation of milliMap, a single-chip millimetre wave (mmWave) radar based indoor mapping system targetted towards low-visibility environments to assist in emergency response.
There is considerable work in the area of visual odometry (VO), and recent advances in deep learning have brought novel approaches to VO, which directly learn salient features from raw images.
no code implementations • 16 Sep 2019 • Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Chris Xiaoxuan Lu, Yasin Almalioglu, Stefano Rosa, Changhao Chen, Johan Wahlström, Wei Wang, Andrew Markham, Niki Trigoni
The hallucination network is taught to predict fake visual features from thermal images by using Huber loss.
With the fast-growing demand of location-based services in various indoor environments, robust indoor ego-motion estimation has attracted significant interest in the last decades.
A trade-off exists between reconstruction quality and the prior regularisation in the Evidence Lower Bound (ELBO) loss that Variational Autoencoder (VAE) models use for learning.
Deep learning has achieved impressive results in camera localization, but current single-image techniques typically suffer from a lack of robustness, leading to large outliers.
Ranked #2 on Visual Localization on Oxford RobotCar Full
Inspired by the fact that most people carry smart wireless devices with them, e. g. smartphones, we propose to use this wireless identifier as a supervisory label.
In addition we show how DynaNet can indicate failures through investigation of properties such as the rate of innovation (Kalman Gain).
To the best of our knowledge, this is the first work which successfully distill the knowledge from a deep pose regression network.
The framework directly regresses 3D bounding boxes for all instances in a point cloud, while simultaneously predicting a point-level mask for each instance.
Ranked #13 on 3D Instance Segmentation on S3DIS (mPrec metric)
The key to offering personalised services in smart spaces is knowing where a particular person is with a high degree of accuracy.
Inspired by the cognitive process of humans and animals, Curriculum Learning (CL) trains a model by gradually increasing the difficulty of the training data.
Deep learning approaches for Visual-Inertial Odometry (VIO) have proven successful, but they rarely focus on incorporating robust fusion strategies for dealing with imperfect input sensory data.
Variational Auto-encoders (VAEs) have been very successful as methods for forming compressed latent representations of complex, often high-dimensional, data.
Due to the sparse rewards and high degree of environment variation, reinforcement learning approaches such as Deep Deterministic Policy Gradient (DDPG) are plagued by issues of high variance when applied in complex real world environments.
Inertial information processing plays a pivotal role in ego-motion awareness for mobile agents, as inertial measurements are entirely egocentric and not environment dependent.
Advances in micro-electro-mechanical (MEMS) techniques enable inertial measurements units (IMUs) to be small, cheap, energy efficient, and widely used in smartphones, robots, and drones.
In the last decade, supervised deep learning approaches have been extensively employed in visual odometry (VO) applications, which is not feasible in environments where labelled data is not abundant.
In this framework, real images are first converted to a synthetic domain representation that reduces complexity arising from lighting and texture.
However, GRU based approaches are unable to consistently estimate 3D shapes given different permutations of the same set of input images as the recurrent unit is permutation variant.
Ranked #1 on 3D Reconstruction on Data3D−R2N2
This is further confounded by the fact that shape information about encountered objects in the real world is often impaired by occlusions, noise and missing regions e. g. a robot manipulating an object will only be able to observe a partial view of the entire solid.
Modelling the physical properties of everyday objects is a fundamental prerequisite for autonomous robots.
Unlike existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid with a high resolution of 256^3 by recovering the occluded/missing regions.
Inertial sensors play a pivotal role in indoor localization, which in turn lays the foundation for pervasive personal applications.
This paper presents a novel end-to-end framework for monocular VO by using deep Recurrent Convolutional Neural Networks (RCNNs).
In this paper, we propose a novel 3D-RecGAN approach, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks.
Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images.
In this paper we present an on-manifold sequence-to-sequence learning approach to motion estimation using visual and inertial sensors.