Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems.
We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.
With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.
We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image.
In this work, we investigate regression into the latent space as a probe to understand the compositional properties of GANs.
Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well.
We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF.
The quality of image generation and manipulation is reaching impressive levels, making it increasingly difficult for a human to distinguish between what is real and what is fake.
Humans integrate multiple sensory modalities (e. g. visual and audio) to build a causal understanding of the physical world.
Contrastive representation learning has been outstandingly successful in practice.
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning.
Ranked #37 on Self-Supervised Image Classification on ImageNet
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.
Ranked #247 on Image Classification on ImageNet
The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost.
We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
In this paper, we tackle the generalization problem via fast adaptation, where we train a prediction model to quickly adapt to the observed visual dynamics of a novel object.
Such models, however, are approximate, which limits their applicability.
We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence.
We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.
Ranked #36 on Self-Supervised Action Recognition on UCF101
The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources.
In this paper we propose an "Internal GAN" (InGAN) - an image-specific GAN - which trains on a single input image and learns its internal distribution of patches.
We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics.
Domain adaptation is critical for success in new, unseen environments.
The main strengths of our approach are its robustness to freehand bitmap drawings, its ability to adapt to different object categories, and the continuum it offers between single-view and multi-view sketch-based modeling.
The system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a Convolutional Neural Network (CNN).
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
Ranked #1 on Image-to-Image Translation on photo2vangogh (Frechet Inception Distance metric)
Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics.
We propose split-brain autoencoders, a straightforward modification of the traditional autoencoder architecture, for unsupervised representation learning.
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems.
Ranked #1 on Image-to-Image Translation on Aerial-to-Map
We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result.
Ranked #85 on Self-Supervised Image Classification on ImageNet
We demonstrate that this frame- work works well on two important mid-level vision tasks: intrinsic image decomposition and depth from an RGB im- age.
We propose a self-supervised framework that learns to group visual entities based on their rate of co-occurrence in space and time.
Our system works by generalizing across object classes: states and transformations learned on one set of objects are used to interpret the image collection for an entirely new object class.
In this paper, we study the problem of reproducing the world lighting from a single image of an object covered with random specular microfacets on the surface.