Stochastic Approximation EM for Logistic Regression with Missing Values

11 May 2018  ·  Wei Jiang, Julie Josse, Marc Lavielle, Traumabase group ·

Logistic regression is a common classification method in supervised learning. Surprisingly, there are very few solutions for performing it and selecting variables in the presence of missing values. We propose a stochastic approximation version of the EM algorithm based on Metropolis-Hasting sampling, to perform statistical inference for logistic regression with incomplete data. We propose a complete approach, including the estimation of parameters and their variance, derivation of confidence intervals, a model selection procedure, and a method for prediction on test sets with missing values. The method is computationally efficient, and its good coverage and variable selection properties are demonstrated in a simulation study. We then illustrate the method on a dataset of polytraumatized patients from Paris hospitals to predict the occurrence of hemorrhagic shock, a leading cause of early preventable death in severe trauma cases. The aim is to consolidate the current red flag procedure, a binary alert identifying patients with a high risk of severe hemorrhage. The methodology is implemented in the R package misaem.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper