Sparse Quadratic Logistic Regression in Sub-quadratic Time

We consider support recovery in the quadratic logistic regression setting - where the target depends on both p linear terms $x_i$ and up to $p^2$ quadratic terms $x_i x_j$. Quadratic terms enable prediction/modeling of higher-order effects between features and the target, but when incorporated naively may involve solving a very large regression problem. We consider the sparse case, where at most $s$ terms (linear or quadratic) are non-zero, and provide a new faster algorithm. It involves (a) identifying the weak support (i.e. all relevant variables) and (b) standard logistic regression optimization only on these chosen variables. The first step relies on a novel insight about correlation tests in the presence of non-linearity, and takes $O(pn)$ time for $n$ samples - giving potentially huge computational gains over the naive approach. Motivated by insights from the boolean case, we propose a non-linear correlation test for non-binary finite support case that involves hashing a variable and then correlating with the output variable. We also provide experimental results to demonstrate the effectiveness of our methods.

Results in Papers With Code
(↓ scroll down to see all results)