no code implementations • 9 Jun 2023 • Reagan Mozer, Aaron R. Kaufman, Leo A. Celi, Luke Miratrix
We augment a classic matching analysis by incorporating text in three ways: by using text to supplement a multiple imputation procedure, we improve the fidelity of imputed values to handle missing data; by incorporating text in the matching stage, we strengthen the plausibility of the matching procedure; and by conditioning on text, we can estimate easily interpretable text-based heterogeneous treatment effects that may be stronger than those found across categories of structured covariates.
no code implementations • 22 Jan 2021 • Devin Caughey, Allan Dafoe, Xinran Li, Luke Miratrix
We show that exact confidence intervals for the maximum (or minimum) unit-level effect can be obtained by inverting tests for a sequence of bounded nulls.
Methodology Statistics Theory Statistics Theory
2 code implementations • 13 Feb 2020 • Luke Miratrix
We are sometimes forced to use the Interrupted Time Series (ITS) design as an identification strategy for potential policy change, such as when we only have a single treated unit and no comparable controls.
Methodology Applications
1 code implementation • 2 Jan 2018 • Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L. Jason Anastasopoulos
We enhance the precision of these results by developing a predictive model to estimate the match quality of pairs of text documents as a function of our various distance scores.
no code implementations • 12 Jan 2017 • Angela Fan, Finale Doshi-Velez, Luke Miratrix
In this work, we first show how the standard topic quality measures of coherence and pointwise mutual information act counter-intuitively in the presence of common but irrelevant words, making it difficult to even quantitatively identify situations in which topics may be dominated by stopwords.
no code implementations • 20 Nov 2015 • Luke Miratrix, Robin Ackerman
We propose a general framework for topic-specific summarization of large text corpora, and illustrate how it can be used for analysis in two quite different contexts: an OSHA database of fatality and catastrophe reports (to facilitate surveillance for patterns in circumstances leading to injury or death) and legal decisions on workers' compensation claims (to explore relevant case law).
no code implementations • 29 Apr 2014 • Jinzhu Jia, Luke Miratrix, Bin Yu, Brian Gawalt, Laurent El Ghaoui, Luke Barnesmoore, Sophie Clavier
In this paper we propose a general framework for topic-specific summarization of large text corpora and illustrate how it can be used for the analysis of news databases.