Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias.
Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives and are gaining more attention in machine learning applications.
Within Powderworld, two motivating challenges distributions are presented, one for world-modelling and one for reinforcement learning.
We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene.
Meaningful uncertainty quantification in computer vision requires reasoning about semantic information -- say, the hair color of the person in a photo or the location of a car on the street.
1 code implementation • 14 Jul 2022 • Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron, Jean Piou, Hrishikesh M. Rao, Hayley Reynolds, Kaira Samuel, Siddharth Samsi, Morgan Schmidt, Leslie Shing, Olga Simek, Brandon Swenson, Vivienne Sze, Jonathan Taylor, Paul Tylkin, Mark Veillette, Matthew L Weiss, Allan Wollaber, Sophia Yuditskaya, Jeremy Kepner
Through a series of federal initiatives and orders, the U. S. Government has been making a concerted effort to ensure American leadership in AI.
In contrast, our proposed Poisson Quasimetric Embedding (PQE) is the first quasimetric learning formulation that both is learnable with gradient-based optimization and enjoys strong performance guarantees.
The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence.
no code implementations • 15 Apr 2022 • Miriam Cha, Kuan Wei Huang, Morgan Schmidt, Gregory Angelides, Mark Hamilton, Sam Goldberg, Armando Cabrera, Phillip Isola, Taylor Perron, Bill Freeman, Yen-Chen Lin, Brandon Swenson, Jean Piou
The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions.
To take advantage of varied-size data, we introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
We introduce a geometry loss which predicts depth information from the image features of a line drawing, and a semantic loss which matches the CLIP features of a line drawing with its corresponding photograph.
In particular, we demonstrate that a NeRF representation of a scene can be used to train dense object descriptors.
Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result.
Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems.
We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.
With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.
We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image.
In this work, we investigate regression into the latent space as a probe to understand the compositional properties of GANs.
We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.
We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF.
The quality of image generation and manipulation is reaching impressive levels, making it increasingly difficult for a human to distinguish between what is real and what is fake.
Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning.
Ranked #2 on Contrastive Learning on imagenet-1k
Contrastive representation learning has been outstandingly successful in practice.
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.
Ranked #3 on class-incremental learning on cifar100
The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost.
We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
In this paper, we tackle the generalization problem via fast adaptation, where we train a prediction model to quickly adapt to the observed visual dynamics of a novel object.
We demonstrate that this objective ignores important structural knowledge of the teacher network.
Ranked #5 on Knowledge Distillation on CIFAR-100
Such models, however, are approximate, which limits their applicability.
We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence.
We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.
Ranked #43 on Self-Supervised Action Recognition on UCF101
The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources.
In this paper we propose an "Internal GAN" (InGAN) - an image-specific GAN - which trains on a single input image and learns its internal distribution of patches.
We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics.
Ranked #19 on Video Quality Assessment on MSU FR VQA Benchmark
Domain adaptation is critical for success in new, unseen environments.
The main strengths of our approach are its robustness to freehand bitmap drawings, its ability to adapt to different object categories, and the continuum it offers between single-view and multi-view sketch-based modeling.
The system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a Convolutional Neural Network (CNN).
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
Ranked #1 on Image-to-Image Translation on zebra2horse (Frechet Inception Distance metric)
Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics.
We propose split-brain autoencoders, a straightforward modification of the traditional autoencoder architecture, for unsupervised representation learning.
Ranked #103 on Self-Supervised Image Classification on ImageNet
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems.
We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result.
Ranked #104 on Self-Supervised Image Classification on ImageNet
We demonstrate that this frame- work works well on two important mid-level vision tasks: intrinsic image decomposition and depth from an RGB im- age.
We propose a self-supervised framework that learns to group visual entities based on their rate of co-occurrence in space and time.
Our system works by generalizing across object classes: states and transformations learned on one set of objects are used to interpret the image collection for an entirely new object class.
In this paper, we study the problem of reproducing the world lighting from a single image of an object covered with random specular microfacets on the surface.