However, image formation is typically under-constrained due to a limited number of measurements and bandlimited hardware, which limits the capabilities of existing reconstruction methods.
Efficient and accurate detection of subtle motion generated from small objects in noisy environments, as needed for vital sign monitoring, is challenging, but can be substantially improved with magnification.
3D reconstruction algorithms should utilize the low cost and pervasiveness of video camera sensors, from both overhead and soldier-level perspectives.
In this paper, we propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality and robust across different corpora.
In theory, deconvolution overcomes bandwidth limitations by reversing the PSF-induced blur and recovering the scene's scattering distribution.
Our adaptive subsampling algorithms comprise an object detector and an ROI predictor (Kalman filter) which operate in conjunction to optimize the energy efficiency of the vision pipeline with the end task being object tracking.
The lensless pinhole camera is perhaps the earliest and simplest form of an imaging system using only a pinhole-sized aperture in place of a lens.
However, if the scene is moving too fast, then the sampling occurs along a limited view and is difficult to reconstruct due to spatiotemporal ambiguities.
We replace the mel-spectrum upsampler in DiffWave with a deep CNN upsampler, which is trained to alter the degraded speech mel-spectrum to match that of the original speech.
The dispersion model is introduced to simulate realistic spectral variation, and an efficient method to fit the parameters is presented.
Acquisition of Synthetic Aperture Sonar (SAS) datasets is bottlenecked by the costly deployment of SAS imaging systems, and even when data acquisition is possible, the data is often skewed towards containing barren seafloor rather than objects of interest.
We achieve an average identification of 87. 1% object identification for four classes of objects, and average localization of the NLOS object's centroid with a mean-squared error (MSE) of 1. 97 cm in the occluded region for real data taken from a hardware prototype.
The advent of generative adversarial networks (GAN) has enabled new capabilities in synthesis, interpolation, and data augmentation heretofore considered very challenging.
Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition.
Hardware support for deep convolutional neural networks (CNNs) is critical to advanced computer vision in mobile and embedded devices.
We test our network reconstructions on synthetic light fields, simulated coded measurements of real light fields captured from a Lytro Illum camera, and real coded images from a custom CMOS diffractive light field camera.
Deep learning using convolutional neural networks (CNNs) is quickly becoming the state-of-the-art for challenging computer vision applications.
In this paper, we explore the strengths and weaknesses of combining light field and time-of-flight imaging, particularly the feasibility of an on-chip implementation as a single hybrid depth sensor.