Performance of Bayesian linear regression in a model with mismatch

14 Jul 2021  ·  Jean Barbier, Wei-Kuo Chen, Dmitry Panchenko, Manuel Sáenz ·

In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previously studied for Bayesian-optimal linear regression where the correct posterior is used for inference, much less is known when there is a mismatch. Here we consider a model in which the responses are corrupted by gaussian noise and are known to be generated as linear combinations of the covariates, but the distributions of the ground-truth regression coefficients and of the noise are unknown. This regression task can be rephrased as a statistical mechanics model known as the Gardner spin glass, an analogy which we exploit. Using a leave-one-out approach we characterize the mean-square error for the regression coefficients. We also derive the log-normalizing constant of the posterior. Similar models have been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are much more straightforward. An interesting consequence of our analysis is that in the quadratic loss case, the performance of the Bayesian estimator is independent of a global "temperature" hyperparameter and matches the ridge estimator: sampling and optimizing are equally good.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods