We observe that uniform sampling from diffusion models predominantly samples from high-density regions of the data manifold.
Systematic diagnosis of fairness, harms, and biases of computer vision systems is an important step towards building socially responsible systems.
This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.
Under this threat model, we create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
The videos were recorded in multiple U. S. states with a diverse set of adults in various age, gender and apparent skin tone groups.
The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations.
While variational methods have been among the most powerful tools for solving linear inverse problems in imaging, deep (convolutional) neural networks have recently taken the lead in many challenging benchmarks.
In this work we propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes.