Covariate powered cross-weighted multiple testing

18 Jan 2017  ·  Nikolaos Ignatiadis, Wolfgang Huber ·

A fundamental task in the analysis of datasets with many variables is screening for associations. This can be cast as a multiple testing task, where the objective is achieving high detection power while controlling type I error. We consider $m$ hypothesis tests represented by pairs $((P_i, X_i))_{1\leq i \leq m}$ of p-values $P_i$ and covariates $X_i$, such that $P_i \perp X_i$ if $H_i$ is null. Here, we show how to use information potentially available in the covariates about heterogeneities among hypotheses to increase power compared to conventional procedures that only use the $P_i$. To this end, we upgrade existing weighted multiple testing procedures through the Independent Hypothesis Weighting (IHW) framework to use data-driven weights that are calculated as a function of the covariates. Finite sample guarantees, e.g., false discovery rate (FDR) control, are derived from cross-weighting, a data-splitting approach that enables learning the weight-covariate function without overfitting as long as the hypotheses can be partitioned into independent folds, with arbitrary within-fold dependence. IHW has increased power compared to methods that do not use covariate information. A key implication of IHW is that hypothesis rejection in common multiple testing setups should not proceed according to the ranking of the p-values, but by an alternative ranking implied by the covariate-weighted p-values.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper