This paper proposes a class of neural ordinary differential equations parametrized by provably input-to-state stable continuous-time recurrent neural networks.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm.
We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known.
We consider the following learning problem: Given sample pairs of input and output signals generated by an unknown nonlinear system (which is not assumed to be causal or time-invariant), we wish to find a continuous-time recurrent neural net with hyperbolic tangent activation function that approximately reproduces the underlying i/o behavior with high confidence.
We first consider the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continual ensemble, and show that our problem can be phrased as global minimization of a free-energy functional on the space of probability measures over the weights.
In the second step, CI testing is performed by applying the $k$-NN conditional mutual information estimator to the learned feature maps.
There has been a recent shift in sequence-to-sequence modeling from recurrent network architectures to convolutional network architectures due to computational advantages in training and operation while still achieving competitive performance.
In deep latent Gaussian models, the latent variable is generated by a time-inhomogeneous Markov chain, where at each time step we pass the current state through a parametric nonlinear map, such as a feedforward neural net, and add a small independent Gaussian perturbation.
We introduce and study a class of probabilistic generative models, where the latent object is a finite-dimensional diffusion process on a finite time interval and the observed variable is drawn conditionally on the terminal point of the diffusion.
This paper generalizes the Maurer--Pontil framework of finite-dimensional lossy coding schemes to the setting where a high-dimensional random vector is mapped to an element of a compact set of latent representations in a lower-dimensional Euclidean space, and the reconstruction map belongs to a given class of nonlinear maps.
For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the $\varepsilon$-neighborhood of this particular optimum within a short recurrence time; (2) it enters this $\varepsilon$-neighborhood by the recurrence time and stays there until a potentially exponentially long escape time.
As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data.
We present a generic framework for trading off fidelity and cost in computing stochastic gradients when the costs of acquiring stochastic gradients of different quality are not known a priori.
Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration.
In ODA-C, to mitigate the disagreements on the primal-vector updates, the agents implement a generalization of the local information-exchange dynamics recently proposed by Li and Marden over a static undirected graph.
This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space.
Online learning algorithms are designed to perform in non-stationary environments, but generally there is no notion of a dynamic state to model constraints on current and future actions as a function of past actions.
We study a model of collective real-time decision-making (or learning) in a social network operating in an uncertain environment, for which no a priori probabilistic model is available.
This monograph focuses on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding.
Information Theory Information Theory Probability
This paper addresses the problem of designing binary codes for high-dimensional data such that vectors that are similar in the original space map to similar binary strings.
This paper describes a recursive estimation procedure for multivariate binary densities using orthogonal expansions.