no code implementations • 12 Nov 2024 • Davin Choo, Chandler Squires, Arnab Bhattacharyya, David Sontag
Motivated by this result, we introduce the notion of an \emph{$\eps$-Markov blanket}, give bounds on the misspecification error of using such a set for covariate adjustment, and provide an algorithm for $\eps$-Markov blanket discovery; our second main result upper bounds the sample complexity of this algorithm.
1 code implementation • 12 Jul 2024 • Christina X Ji, Ahmed M Alaa, David Sontag
In this work, we construct a benchmark with different sequences of synthetic shifts to evaluate the effectiveness of 3 classes of methods that 1) learn from all data without adapting to the final period, 2) learn from historical data with no regard to the sequential nature and then adapt to the final period, and 3) leverage the sequential nature of historical data when tailoring a model to the final period.
1 code implementation • 5 Jun 2024 • Ilker Demirel, Ahmed Alaa, Anthony Philippakis, David Sontag
Causal inferences from a randomized controlled trial (RCT) may not pertain to a target population where some effect modifiers have a different distribution.
no code implementations • 25 May 2024 • Hunter Lang, David Sontag, Aravindan Vijayaraghavan
We give a new bound based on expansion properties of the data distribution and student hypothesis class that directly accounts for pseudolabel correction and coverage expansion.
1 code implementation • 3 Apr 2024 • Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag
Evaluation of large language models for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), or more recently using human preferences of LLM responses.
1 code implementation • 6 Mar 2024 • Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag
We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level.
1 code implementation • 29 Feb 2024 • Keying Kuang, Frances Dean, Jack B. Jedlicki, David Ouyang, Anthony Philippakis, David Sontag, Ahmed M. Alaa
We approach the digital twin modeling as a composite inverse problem, and observe that its structure resembles pretraining and finetuning in self-supervised learning (SSL).
no code implementations • 23 Feb 2024 • Ilker Demirel, Edward De Brouwer, Zeshan Hussain, Michael Oberst, Anthony Philippakis, David Sontag
Drawing causal inferences from observational studies (OS) requires unverifiable validity assumptions; however, one can falsify those assumptions by benchmarking the OS with experimental data from a randomized controlled trial (RCT).
1 code implementation • 23 Feb 2024 • Stefan Hegselmann, Shannon Zejiang Shen, Florian Gierse, Monica Agrawal, David Sontag, Xiaoyi Jiang
In this work, we investigate the potential of large language models to generate patient summaries based on doctors' notes and study the effect of training data on the faithfulness and quality of the generated summaries.
no code implementations • 17 Jan 2024 • Niklas Mannhardt, Elizabeth Bondi-Kelly, Barbara Lam, Hussein Mozannar, Chloe O'Connell, Mercy Asiedu, Alejandro Buendia, Tatiana Urman, Irbaz B. Riaz, Catherine E. Ricciardi, Monica Agrawal, Marzyeh Ghassemi, David Sontag
Participants (N=200, healthy, female-identifying patients) were randomly assigned three clinical notes in our tool with varying levels of augmentations and answered quantitative and qualitative questions evaluating their understanding of follow-up actions.
no code implementations • 15 Nov 2023 • Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim
LLMs are vulnerable to hallucinations, and thus their outputs generally require laborious human verification for high-stakes applications.
1 code implementation • NeurIPS 2023 • Hussein Mozannar, Jimin J Lee, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag
In this work, we propose to learn rules, grounded in data regions and described in natural language, that illustrate how the human should collaborate with the AI.
no code implementations • 9 Aug 2023 • Sharon Jiang, Shannon Shen, Monica Agrawal, Barbara Lam, Nicholas Kurtzman, Steven Horng, David Karger, David Sontag
The large amount of time clinicians spend sifting through patient notes and documenting in electronic health records (EHRs) is a leading cause of clinician burnout.
no code implementations • 26 May 2023 • Hussein Mozannar, Yuria Utsumi, Irene Y. Chen, Stephanie S. Gervasi, Michele Ewing, Aaron Smith-McLallen, David Sontag
This work presents the implementation of a real-world ML-based system to assist care managers in identifying pregnant patients at risk of complications.
1 code implementation • 8 May 2023 • Christina X Ji, Ahmed M Alaa, David Sontag
Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks.
no code implementations • 5 Apr 2023 • Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, David Sontag
In this position paper, we argue that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts.
no code implementations • 4 Apr 2023 • Ahmed M. Alaa, Zeshan Hussain, David Sontag
We develop a predictive inference procedure that combines conformal prediction (CP) with unconditional quantile regression (QR) -- a commonly used tool in econometrics that involves regressing the recentered influence function (RIF) of the quantile functional over input covariates.
no code implementations • 30 Jan 2023 • Zeshan Hussain, Ming-Chieh Shih, Michael Oberst, Ilker Demirel, David Sontag
Our approach is interpretable, allowing a practitioner to visualize which subgroups in the population lead to falsification of an observational study.
1 code implementation • 15 Jan 2023 • Hussein Mozannar, Hunter Lang, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag
We show that prior approaches can fail to find a human-AI system with low misclassification error even when there exists a linear classifier and rejector that have zero error (the realizable setting).
1 code implementation • 19 Oct 2022 • Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, David Sontag
We study the application of large language models to zero-shot and few-shot classification of tabular data.
1 code implementation • 27 Sep 2022 • Zeshan Hussain, Michael Oberst, Ming-Chieh Shih, David Sontag
Under the assumption that at least one observational estimator is asymptotically normal and consistent for both the validation and extrapolated effects, we provide guarantees on the coverage probability of the intervals output by our algorithm.
1 code implementation • 19 Jul 2022 • Mohammad-Amin Charusaie, Hussein Mozannar, David Sontag, Samira Samadi
One of the goals of learning algorithms is to complement and reduce the burden on human decision makers.
1 code implementation • 6 Jun 2022 • Hunter Lang, Aravindan Vijayaraghavan, David Sontag
Subset selection applies to any label model and classifier and is very simple to plug in to existing weak supervision pipelines, requiring just a few lines of code.
1 code implementation • 31 May 2022 • Nikolaj Thams, Michael Oberst, David Sontag
We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance.
no code implementations • 25 May 2022 • Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, David Sontag
A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes.
1 code implementation • 2 Feb 2022 • Hunter Lang, Monica Agrawal, Yoon Kim, David Sontag
We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data.
1 code implementation • 22 Nov 2021 • Hussein Mozannar, Arvind Satyanarayan, David Sontag
For this collaboration to perform properly, the human decision maker must have a mental model of when and when not to rely on the agent.
no code implementations • 4 Nov 2021 • Monica Agrawal, Hunter Lang, Michael Offin, Lior Gazit, David Sontag
Label-scarce, high-dimensional domains such as healthcare present a challenge for modern machine learning techniques.
1 code implementation • 28 Oct 2021 • Rickard K. A. Karlsson, Martin Willbo, Zeshan Hussain, Rahul G. Krishnan, David Sontag, Fredrik D. Johansson
Our question is when using this privileged data leads to more sample-efficient learning of models that use only baseline data for predictions at test time.
1 code implementation • NeurIPS 2021 • Justin Lim, Christina X Ji, Michael Oberst, Saul Blecker, Leora Horwitz, David Sontag
Individuals often make different decisions when faced with the same context, due to personal preferences and background.
1 code implementation • ACL 2021 • James Mullenbach, Yada Pruksachatkun, Sean Adler, Jennifer Seale, Jordan Swartz, T. Greg McKelvey, Hui Dai, Yi Yang, David Sontag
In this work, we describe our creation of a dataset of clinical action items annotated over MIMIC-III, the largest publicly available dataset of real clinical notes.
no code implementations • 8 Mar 2021 • Ariel Levy, Monica Agrawal, Arvind Satyanarayan, David Sontag
Automated decision support can accelerate tedious tasks as users can focus their attention where it is needed most.
Decision Making Human-Computer Interaction
1 code implementation • 3 Mar 2021 • Michael Oberst, Nikolaj Thams, Jonas Peters, David Sontag
In the case of two proxy variables, we propose a modified estimator that is prediction optimal under interventions up to a known strength.
no code implementations • 26 Feb 2021 • Hunter Lang, Aravind Reddy, David Sontag, Aravindan Vijayaraghavan
Several works have shown that perturbation stable instances of the MAP inference problem in Potts models can be solved exactly using a natural linear programming (LP) relaxation.
2 code implementations • 22 Feb 2021 • Zeshan Hussain, Rahul G. Krishnan, David Sontag
Modeling the time-series of high-dimensional, longitudinal data is important for predicting patient disease progression.
no code implementations • 13 Feb 2021 • Irene Y. Chen, Rahul G. Krishnan, David Sontag
In this work, we focus on mitigating the interference of interval censoring in the task of clustering for disease phenotyping.
no code implementations • 7 Nov 2020 • Hunter Lang, David Sontag, Aravindan Vijayaraghavan
On "real-world" instances, MAP assignments of small perturbations of the problem should be very similar to the MAP assignment(s) of the original problem instance.
1 code implementation • 8 Oct 2020 • Christina X. Ji, Michael Oberst, Sanjat Kanjilal, David Sontag
Reinforcement learning (RL) has the potential to significantly improve clinical decision making.
1 code implementation • 31 Jul 2020 • Monica Agrawal, Chloe O'Connell, Yasmin Fatemi, Ariel Levy, David Sontag
We reformulate the annotation framework for clinical entity extraction to factor in these issues to allow for robust end-to-end system benchmarking.
1 code implementation • 29 Jul 2020 • Divya Gopinath, Monica Agrawal, Luke Murray, Steven Horng, David Karger, David Sontag
We present a system that uses a learned autocompletion mechanism to facilitate rapid creation of semi-structured clinical documentation.
1 code implementation • 23 Jul 2020 • Alexander K. Lew, Monica Agrawal, David Sontag, Vikash K. Mansinghka
Data cleaning is naturally framed as probabilistic inference in a generative model of ground-truth data and likely errors, but the diversity of real-world error patterns and the hardness of inference make Bayesian approaches difficult to automate.
1 code implementation • 10 Jul 2020 • Rohan S. Kodialam, Rebecca Boiarsky, Justin Lim, Neil Dixit, Aditya Sai, David Sontag
Healthcare providers are increasingly using machine learning to predict patient outcomes to make meaningful interventions.
1 code implementation • ICML 2020 • Hussein Mozannar, David Sontag
Learning algorithms are often used in conjunction with expert decision makers in practical scenarios, however this fact is largely ignored when designing these algorithms.
1 code implementation • 1 Jun 2020 • Soorajnath Boominathan, Michael Oberst, Helen Zhou, Sanjat Kanjilal, David Sontag
In several medical decision-making problems, such as antibiotic prescription, laboratory testing can provide precise indications for how a patient will respond to different treatment options.
1 code implementation • 27 Apr 2020 • James Mullenbach, Jordan Swartz, T. Greg McKelvey, Hui Dai, David Sontag
Both electronic health records and personal health records are typically organized by data type, with medical problems, medications, procedures, and laboratory results chronologically sorted in separate areas of the chart.
no code implementations • 21 Jan 2020 • Fredrik D. Johansson, Uri Shalit, Nathan Kallus, David Sontag
Practitioners in diverse fields such as healthcare, economics and education are eager to apply machine learning to improve decision making.
no code implementations • ICML 2020 • Maggie Makar, Fredrik D. Johansson, John Guttag, David Sontag
Estimation of individual treatment effects is commonly used as the basis for contextual decision making in fields such as healthcare, education, and economics.
no code implementations • 7 Oct 2019 • Viraj Prabhu, Anitha Kannan, Geoffrey J. Tso, Namit Katariya, Manish Chablani, David Sontag, Xavier Amatriain
Machine-learned diagnosis models have shown promise as medical aides but are trained under a closed-set assumption, i. e. that models will only encounter conditions on which they have been trained.
no code implementations • 2 Oct 2019 • Irene Y. Chen, Monica Agrawal, Steven Horng, David Sontag
Increasingly large electronic health records (EHRs) provide an opportunity to algorithmically learn medical knowledge.
no code implementations • 25 Sep 2019 • Rares-Darius Buhai, Andrej Risteski, Yoni Halpern, David Sontag
One of the most surprising and exciting discoveries in supervising learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).
1 code implementation • 9 Jul 2019 • Michael Oberst, Fredrik D. Johansson, Dennis Wei, Tian Gao, Gabriel Brat, David Sontag, Kush R. Varshney
Overlap between treatment groups is required for non-parametric estimation of causal effects.
1 code implementation • ICML 2020 • Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag
One of the most surprising and exciting discoveries in supervised learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).
1 code implementation • 14 May 2019 • Michael Oberst, David Sontag
We introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy.
no code implementations • 8 Mar 2019 • Fredrik D. Johansson, David Sontag, Rajesh Ranganath
In this work, we give generalization bounds for unsupervised domain adaptation that hold for any representation function by acknowledging the cost of non-invertibility.
no code implementations • 24 Jan 2019 • Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d'Aspremont, David Sontag
Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere.
no code implementations • 7 Nov 2018 • Viraj Prabhu, Anitha Kannan, Murali Ravuri, Manish Chablani, David Sontag, Xavier Amatriain
We consider the problem of image classification for the purpose of aiding doctors in dermatological diagnosis.
no code implementations • 12 Oct 2018 • Hunter Lang, David Sontag, Aravindan Vijayaraghavan
The simplest stability condition assumes that the MAP solution does not change at all when some of the pairwise potentials are (adversarially) perturbed.
no code implementations • 31 May 2018 • Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Dong-hun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, Finale Doshi-Velez
Much attention has been devoted recently to the development of machine learning algorithms with the goal of improving treatment policies in healthcare.
no code implementations • NeurIPS 2018 • Irene Chen, Fredrik D. Johansson, David Sontag
Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy.
no code implementations • ICLR 2018 • Fredrik D. Johansson, Nathan Kallus, Uri Shalit, David Sontag
We pose both of these problems as prediction under a shift in design.
1 code implementation • ICML 2018 • Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush
Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network.
Ranked #2 on Text Generation on Yahoo Questions
no code implementations • 6 Nov 2017 • Hunter Lang, David Sontag, Aravindan Vijayaraghavan
Approximate algorithms for structured prediction problems---such as LP relaxations and the popular alpha-expansion algorithm (Boykov et al. 2001)---typically far exceed their theoretical performance guarantees on real-world instances.
6 code implementations • NeurIPS 2017 • Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, Max Welling
Learning individual-level causal effects from observational data, such as inferring the most effective medication for a specific patient, is a problem of growing importance for policy makers.
Ranked #9 on Causal Inference on IHDP
no code implementations • 23 May 2017 • Ankit Vani, Yacine Jernite, David Sontag
In this work, we present the Grounded Recurrent Neural Network (GRNN), a recurrent neural network architecture for multi-label prediction which explicitly ties labels to specific dimensions of the recurrent hidden state (we call this process "grounding").
no code implementations • 23 Apr 2017 • Yacine Jernite, Samuel R. Bowman, David Sontag
This work presents a novel objective function for the unsupervised training of neural network sentence encoders.
no code implementations • ICML 2017 • Yacine Jernite, Anna Choromanska, David Sontag
We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time.
3 code implementations • 30 Sep 2016 • Rahul G. Krishnan, Uri Shalit, David Sontag
We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks.
Ranked #6 on Multivariate Time Series Forecasting on USHCN-Daily
1 code implementation • 2 Aug 2016 • Narges Razavian, Jake Marcus, David Sontag
Disparate areas of machine learning have benefited from models that can take raw data with little preprocessing as input and learn rich representations of that raw data in order to perform well on a given prediction task.
no code implementations • 2 Aug 2016 • Shalmali Joshi, Suriya Gunasekar, David Sontag, Joydeep Ghosh
This work proposes a new algorithm for automated and simultaneous phenotyping of multiple co-occurring medical conditions, also referred as comorbidities, using clinical notes from the electronic health records (EHRs).
no code implementations • 2 Aug 2016 • Yoni Halpern, Steven Horng, David Sontag
We describe a method for parameter estimation in bipartite probabilistic graphical models for joint prediction of clinical conditions from the electronic medical record.
4 code implementations • ICML 2017 • Uri Shalit, Fredrik D. Johansson, David Sontag
We give a novel, simple and intuitive generalization-error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization-error of that representation and the distance between the treated and control distributions induced by the representation.
Ranked #3 on Causal Inference on Jobs
7 code implementations • 6 Jun 2016 • Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu
Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values.
Ranked #4 on Multivariate Time Series Imputation on MuJoCo
1 code implementation • 12 May 2016 • Fredrik D. Johansson, Uri Shalit, David Sontag
Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology.
1 code implementation • 25 Nov 2015 • Narges Razavian, David Sontag
Early diagnosis of treatable diseases is essential for improving healthcare, and many diseases' onsets are predictable from annual lab tests and their temporal trends.
3 code implementations • 16 Nov 2015 • Rahul G. Krishnan, Uri Shalit, David Sontag
Motivated by recent variational methods for learning deep generative models, we introduce a unified algorithm to efficiently learn a broad spectrum of Kalman filters.
no code implementations • 10 Nov 2015 • Yoni Halpern, Steven Horng, David Sontag
We present a semi-supervised learning algorithm for learning discrete factor analysis models with arbitrary structure on the latent variables.
1 code implementation • NeurIPS 2015 • Rahul G. Krishnan, Simon Lacoste-Julien, David Sontag
We introduce a globally-convergent algorithm for optimizing the tree-reweighted (TRW) variational objective over the marginal polytope.
no code implementations • 4 Nov 2015 • Ofer Meshi, Mehrdad Mahdavi, Adrian Weller, David Sontag
Structured prediction is used in areas such as computer vision and natural language processing to predict structured outputs such as segmentations or parse trees.
14 code implementations • 26 Aug 2015 • Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush
We describe a simple neural language model that relies only on character-level inputs.
no code implementations • 12 May 2015 • Eliot Brenner, David Sontag
We give a new consistent scoring function for structure learning of Bayesian networks.
no code implementations • 19 Sep 2014 • Amir Globerson, Tim Roughgarden, David Sontag, Cafer Yildirim
We show that the prospects for achieving low expected Hamming error depend on the structure of the graph $G$ in interesting ways.
no code implementations • 17 Jun 2014 • Hung Hai Bui, Tuyen N. Huynh, David Sontag
We first show that for these graphical models, the tree-reweighted variational objective lends itself to a compact lifted formulation which can be solved much more efficiently than the standard TRW formulation for the ground graphical model.
no code implementations • NeurIPS 2013 • Yacine Jernite, Yonatan Halpern, David Sontag
We show that the existence of such a quartet allows us to uniquely identify each latent variable and to learn all parameters involving that latent variable.
no code implementations • 26 Sep 2013 • Yonatan Halpern, David Sontag
This paper considers the problem of learning the parameters in Bayesian networks of discrete variables with known structure and hidden variables.
no code implementations • 26 Sep 2013 • Eliot Brenner, David Sontag
We give a new consistent scoring function for structure learning of Bayesian networks.
2 code implementations • 19 Dec 2012 • Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.
no code implementations • NeurIPS 2011 • David Sontag, Dan Roy
In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard.
no code implementations • NeurIPS 2010 • David Sontag, Ofer Meshi, Amir Globerson, Tommi S. Jaakkola
The problem of learning to predict structured labels is of key importance in many applications.
no code implementations • NeurIPS 2008 • David Sontag, Amir Globerson, Tommi S. Jaakkola
We propose a new class of consistency constraints for Linear Programming (LP) relaxations for finding the most probable (MAP) configuration in graphical models.