Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature.
Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models.
Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm.
We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $\alpha$-weighted-ERM and two-stage-ERM.
Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient.
As a result, they may fail to characterize the exact generalization ability of a learning algorithm.
As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant.
A framework previously introduced in  for solving a sequence of stochastic optimization problems with bounded changes in the minimizers is extended and applied to machine learning problems such as regression and classification.
We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression.
The bound is derived under more general conditions on the loss function than in existing studies; nevertheless, it provides a tighter characterization of the generalization error.
While recommendation systems generally observe user behavior passively, there has been an increased interest in directly querying users to learn their specific preferences.
The goal is to detect whether the change in the model is significant, i. e., whether the difference between the pre-change parameter and the post-change parameter $\|\theta-\theta'\|_2$ is larger than a pre-determined threshold $\rho$.
Furthermore, an estimator of the change in the learning problems using the active learning samples is constructed, which provides an adaptive sample size selection rule that guarantees the excess risk is bounded for sufficiently large number of time steps.
A sequence is considered as outlying if the observations therein are generated by a distribution different from those generating the observations in the majority of the sequences.