Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge.

Thermodynamic integration (TI) offers a rigorous method for estimating free-energy differences by integrating over a sequence of interpolating conformational ensembles.

Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.

For this reason, we propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.

Autoregressive models, such as the GPT family, use a fixed order, usually left-to-right, to generate sequences.

In particular, we propose to approximate a time-dependent operator $\mathcal V_t$ whose time integral provides a mapping between the functional distributions of the free theory $[\mathcal D\phi(x)] \mathcal Z_0^{-1} e^{-\mathcal S_{0}[\phi(x)]}$ and of the target theory $[\mathcal D\phi(x)]\mathcal Z^{-1}e^{-\mathcal S[\phi(x)]}$.

We study the problem of improving the efficiency of segmentation transformers by using disparate amounts of computation for different parts of the image.

While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead of self-attention, those are often limited by implementations concerns and end up imposing a simple and static structure over the attention matrix.

We present the Graph Forward-Forward (GFF) algorithm, an extension of the Forward-Forward procedure to graphs, able to handle features distributed over a graph's nodes.

We introduce a training objective for continuous normalizing flows that can be used in the absence of samples but in the presence of an energy function.

We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM).

Consider a one-parameter family of Boltzmann distributions $p_t(x) = \tfrac{1}{Z_t}e^{-S_t(x)}$.

Notably, our model reduces the root mean square error (RMSE) for wind nowcasting from 9. 24 to 7. 98 and for heat diffusion tasks from 0. 126 to 0. 084.

In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts.

Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems.

On the other hand, neural networks only perform a forward pass on the input, there is neither a notion of an inverse of a neural network nor is there one of its likelihood contribution.

Over the decade since deep neural networks became state of the art image classifiers there has been a tendency towards less use of max pooling: the function that takes the largest of nearby pixels in an image.

With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers.

Semantic segmentation is a well-addressed topic in the computer vision literature, but the design of fast and accurate video processing networks remains challenging.

This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.

This paper proposes a simple yet efficient high-altitude wind nowcasting pipeline.

To render a novel view, the geometry reasoner first constructs cascaded cost volumes for each nearby source view.

Data samples generated by several real world processes are dynamic in nature \textit{i. e.}, their characteristics vary with time.

This algorithm first runs any approximate-PCA method to get an initial estimate of the principal components (priming), and then applies an exact PCA in the subspace they span.

Pretrained language models demonstrate strong performance in most NLP tasks when fine-tuned on small task-specific datasets.

Attention is a key component of the now ubiquitous pre-trained language models.

Following the work of arXiv:2101. 09512, we are interested in clustering a given multi-variate series in an unsupervised manner.

We are interested in clustering parts of a given single multi-variate series in an unsupervised manner.

To apply an algorithm in a sensitive domain it is important to understand the set of input values that result in specific decisions.

Recent advances in language modeling have led to computationally intensive and resource-demanding state-of-the-art models.

This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters.

Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences.

We introduce a new reinforcement learning approach combining a planning quasi-metric (PQM) that estimates the number of steps required to go from any state to another, with task-specific "aimers" that compute a target state to reach a given goal.

The performance of optimizers, particularly in deep learning, depends considerably on their chosen hyperparameter configuration.

We show that sampling from the attention distribution results in an unbiased estimator of the full model with minimal variance, and we derive an unbiased estimator of the gradient that we use to train our model end-to-end with a normal SGD procedure.

We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while the batch version converges.

Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored.

The Generative Adversarial Networks (GANs) have demonstrated impressive performance for data synthesis, and are now used in a wide range of computer vision tasks.

People detection methods are highly sensitive to the perpetual occlusions among the targets.

Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems.

People detection in single 2D images has improved greatly in recent years.

The former does not exploit joint information, whereas the latter deals with ambiguous input due to the foreground blobs becoming more and more interconnected as the number of targets increases.

We present a unified framework for understanding human social behaviors in raw image sequences.

Mean Field inference is central to statistical physics.

We investigate how a residual network can learn to predict the dynamics of interacting shapes purely as an image-to-image regression task.

We run experiments showing that algorithm clarans (Ng et al., 2005) finds better K-medoids solutions than the Voronoi iteration algorithm.

We present a new algorithm, trimed, for obtaining the medoid of a set, that is the element of the set which minimises the mean distance to all other elements.

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003).

We propose a novel accelerated exact k-means algorithm, which performs better than the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3 times faster.

Secondly, we use the proximal framework to derive efficient variational algorithms for non-conjugate models.

Mean-field variational inference is one of the most popular approaches to inference in discrete random fields.

Mean-Field is an efficient way to approximate a posterior distribution in complex graphical models and constitutes the most popular class of Bayesian variational approximation methods.

We propose to train an ensemble with the help of a reservoir in which the learning algorithm can store a limited number of samples.

