Our algorithm, search on the replay buffer (SoRB), enables agents to solve sparse reward tasks over hundreds of steps, and generalizes substantially better than standard RL algorithms.
We present a general framework for solving a large class of learning problems with non-linear functions of classification rates.
We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks.
From this observation, we propose extended rank and sort operators by considering optimal transport (OT) problems (the natural relaxation for assignments) where the auxiliary measure can be any weighted measure supported on $m$ increasing values, where $m \ne n$.
While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting.
The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning.
We introduce a similarity index that measures the relationship between representational similarity matrices and does not suffer from this limitation.