We evaluate our method on real-life data using a variety of metrics to quantify the amount of information an attacker is able to recover.
This indicates that these models are able to learn rich, meaningful representations from our synthetic data and that training on the synthetic data can help overcome the issue of having small, real-life datasets for vision-based key stroke inference attacks.
Low-frequency long-range errors (drift) are an endemic problem in 3D structure from motion, and can often hamper reasonable reconstructions of the scene.
In this work, we propose "tangent images," a spherical image representation that facilitates transferable and scalable $360^\circ$ computer vision.
By learning view synthesis, we explicitly encourage the feature extractor to encode information about not only the visible, but also the occluded parts of the scene.
We present a new deep learning approach to blending for IBR, in which we use held-out real image data to learn blending weights to combine input photo contributions.
In this paper, we introduce a novel multi-camera tracking approach that for the first time jointly leverages the information introduced by rolling shutter and radial distortion as a feature to achieve superior performance with respect to high-frequency camera pose estimation.
Image-based 3D reconstruction for Internet photo collections has become a robust technology to produce impressive virtual representations of real-world scenes.
We address the problem of large scale image geo-localization where the location of an image is estimated by identifying geo-tagged reference images depicting the same place.
Given the smooth motion of dynamic objects, we observe any element in the dictionary can be well approximated by a sparse linear combination of other elements in the same dictionary (i. e. self-expression).
We propose a framework for the automatic creation of time-lapse mosaics of a given scene.
We target the sparse 3D reconstruction of dynamic objects observed by multiple unsynchronized video cameras with unknown temporal overlap.
We propose two novel minimal solvers which advance the state of the art in satellite imagery processing.
We address the problem of recognizing a place depicted in a query image by using a large database of geo-tagged images at a city-scale.
Our calibration scheme allows a head-worn device to calculate a locally optimal eye-device transformation on demand by computing an optimal model from a local window of previous frames.
Based on the insights of this evaluation, we propose a learning-based approach, the PAirwise Image Geometry Encoding (PAIGE), to efficiently identify image pairs with scene overlap without the need to perform exhaustive putative matching and geometric verification.
We propose a novel, large-scale, structure-from-motion framework that advances the state of the art in data scalability from city-scale modeling (millions of images) to world-scale modeling (several tens of millions of images) using just a single computer.
Structure-from-Motion for unordered image collections has significantly advanced in scale over the last decade.
We propose a multi-view depthmap estimation approach aimed at adaptively ascertaining the pixel level data associations between a reference image and all the elements of a source image set.
We develop a sequential optimal sampling framework for stereo disparity estimation by adapting the Sequential Probability Ratio Test (SPRT) model.