no code implementations • ICML 2020 • Nathan Kallus, Masatoshi Uehara
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible.
no code implementations • 29 Mar 2024 • Andrew Bennett, Nathan Kallus, Miruna Oprescu, Wen Sun, Kaiwen Wang
We characterize the sharp bounds on policy value under this model, that is, the tightest possible bounds given by the transition observations from the original MDP, and we study the estimation of these bounds from such transition observations.
no code implementations • 15 Mar 2024 • James McInerney, Nathan Kallus
The Laplace approximation (LA) of the Bayesian posterior is a Gaussian distribution centered at the maximum a posteriori estimate.
no code implementations • 10 Mar 2024 • Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun
We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance.
no code implementations • 8 Mar 2024 • Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári
We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL).
no code implementations • 8 Mar 2024 • Harald Steck, Chaitanya Ekanadham, Nathan Kallus
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations.
1 code implementation • 4 Mar 2024 • Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis
An introduction to the emerging fusion of machine learning and causal inference.
no code implementations • 11 Feb 2024 • Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun
Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributional RL.
Distributional Reinforcement Learning Multi-Armed Bandits +1
no code implementations • 9 Feb 2024 • Brian Cho, Kyra Gan, Nathan Kallus
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams.
no code implementations • 2 Feb 2024 • Su Jia, Peter Frazier, Nathan Kallus
Prior research on experimentation with interference has concentrated on the final output of a policy.
no code implementations • 25 Dec 2023 • Su Jia, Nathan Kallus, Christina Lee Yu
We consider experimentation in the presence of non-stationarity, inter-unit (spatial) interference, and carry-over effects (temporal interference), where we wish to estimate the global average treatment effect (GATE), the difference between average outcomes having exposed all units at all times to treatment or to control.
no code implementations • 6 Nov 2023 • Andrew Bennett, Nathan Kallus, Miruna Oprescu
Low-Rank Markov Decision Processes (MDPs) have recently emerged as a promising framework within the domain of reinforcement learning (RL), as they allow for provably approximately correct (PAC) learning guarantees while also incorporating ML algorithms for representation learning.
no code implementations • 24 Oct 2023 • Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley
To address these challenges, we introduce the Policy Convolution (PC) family of estimators.
1 code implementation • 19 Aug 2023 • Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, Julian McAuley
In this paper, we present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting with three primary contributions.
no code implementations • 25 Jul 2023 • Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara
We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems.
1 code implementation • 21 Jul 2023 • Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization.
no code implementations • NeurIPS 2023 • Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun
In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation.
no code implementations • 24 May 2023 • Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE.
2 code implementations • 20 Apr 2023 • Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit
There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data.
no code implementations • 10 Feb 2023 • Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara
In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.
no code implementations • 7 Feb 2023 • Kaiwen Wang, Nathan Kallus, Wen Sun
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$.
no code implementations • 29 Jan 2023 • Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier
In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes.
1 code implementation • 29 Dec 2022 • Aurelien Bibaut, Nathan Kallus, Michael Lindon
The type-I-error results primarily leverage a martingale strong invariance principle and establish that these tests (and their implied confidence sequences) have type-I error rates asymptotically equivalent to the desired (possibly varying) $\alpha$-level.
no code implementations • 13 Dec 2022 • Masatoshi Uehara, Chengchun Shi, Nathan Kallus
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems.
1 code implementation • 11 Nov 2022 • Nathan Kallus, James McInerney
When the predictive model is simple and its evaluation differentiable, this task is solved by the delta method, where we propagate the asymptotically-normal uncertainty in the predictive model through the evaluation to compute standard errors and Wald confidence intervals.
1 code implementation • 26 Oct 2022 • Andrew Bennett, Dipendra Misra, Nathan Kallus
Many existing approaches to safe RL rely on receiving numeric safety feedback, but in many cases this feedback can only take binary values; that is, whether an action in a given state is safe or unsafe.
no code implementations • 17 Aug 2022 • Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara
In a variety of applications, including nonparametric instrumental variable (NPIV) analysis, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables, we are interested in inference on a continuous linear functional (e. g., average causal effects) of nuisance function (e. g., NPIV regression) defined by conditional moment restrictions.
1 code implementation • NeurIPS 2023 • Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun
Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.
1 code implementation • 12 Jul 2022 • Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun
We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE).
no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space.
no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
We study Reinforcement Learning for partially observable dynamical systems using function approximation.
1 code implementation • 23 May 2022 • Nathan Kallus, Miruna Oprescu
Our method is model-agnostic in that it can provide the best projection of CDTE onto the regression model class.
1 code implementation • 20 May 2022 • Nathan Kallus
), whether exposed to the standard experience A or a new one B, hypothetically it could be because the change affects no one, because the change positively affects half the user population to go from no-click to click while negatively affecting the other half, or something in between.
no code implementations • CVPR 2022 • Shervin Ardeshir, Cristina Segalin, Nathan Kallus
Performance of the model for each group is calculated by comparing $\hat{y}$ and $y$ for the datapoints within a specific group, and as a result, disparity of performance across the different groups can be calculated.
1 code implementation • 19 Feb 2022 • Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou
Thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for standard OPE.
no code implementations • 15 Feb 2022 • Guido Imbens, Nathan Kallus, Xiaojie Mao, Yuhao Wang
In this paper, we uniquely tackle the challenge of persistent unmeasured confounders, i. e., some unmeasured confounders that can simultaneously affect the treatment, short-term outcomes and the long-term outcome, noting that they invalidate identification strategies in previous literature.
1 code implementation • 15 Jan 2022 • Nathan Kallus
Since the average treatment effect (ATE) measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population.
1 code implementation • 21 Dec 2021 • Jacob Dorn, Kevin Guo, Nathan Kallus
We consider the problem of constructing bounds on the average treatment effect (ATE) when unmeasured confounders exist but have bounded influence.
1 code implementation • 28 Oct 2021 • Andrew Bennett, Nathan Kallus
To answer these, we extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible by the existence of so-called bridge functions.
no code implementations • 19 Oct 2021 • Nathan Kallus, Angela Zhou
We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous sequence of arrivals with contexts, which generate unknown individual-level responses to agent actions.
no code implementations • 6 Oct 2021 • James McInerney, Nathan Kallus
The approach, which we term the residual overfit method of exploration (ROME), drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
1 code implementation • NeurIPS 2021 • Nikos Vlassis, Ashok Chandrashekar, Fernando Amat Gil, Nathan Kallus
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions, often termed slates.
no code implementations • NeurIPS 2021 • Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm.
no code implementations • NeurIPS 2021 • Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan
The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage.
no code implementations • 25 Mar 2021 • Nathan Kallus, Xiaojie Mao, Masatoshi Uehara
Previous work has relied on completeness conditions on these functions to identify the causal parameters and required uniqueness assumptions in estimation, and they also focused on parametric estimation of bridge functions.
no code implementations • 5 Feb 2021 • Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.
no code implementations • 31 Jan 2021 • Yichun Hu, Nathan Kallus, Masatoshi Uehara
Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees.
no code implementations • 21 Dec 2020 • Nathan Kallus, Angela Zhou
These different application areas may lead to different concerns around fairness, welfare, and equity on different objectives: price burdens on consumers, price envy, firm revenue, access to a good, equal access, and distributional consequences when the good in question further impacts downstream outcomes of interest.
2 code implementations • 17 Dec 2020 • Andrew Bennett, Nathan Kallus
The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression.
no code implementations • 5 Dec 2020 • Nathan Kallus
I provide a rejoinder for discussion of "More Efficient Policy Learning via Optimal Retargeting" to appear in the Journal of the American Statistical Association with discussion by Oliver Dukes and Stijn Vansteelandt; Sijia Li, Xiudi Li, and Alex Luedtkeand; and Muxuan Liang and Yingqi Zhao.
no code implementations • 5 Nov 2020 • Yichun Hu, Nathan Kallus, Xiaojie Mao
While one may use off-the-shelf machine learning methods to separately learn a predictive model and plug it in, a variety of recent methods instead integrate estimation and optimization by fitting the model to directly optimize downstream decision performance.
1 code implementation • 21 Oct 2020 • Nathan Kallus, Yuta Saito, Masatoshi Uehara
We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i. e., stratified sampling.
1 code implementation • 17 Aug 2020 • Nathan Kallus, Xiaojie Mao
We study contextual stochastic optimization problems, where we leverage rich auxiliary observations (e. g., product characteristics) to improve decision making with uncertain variables (e. g., demand).
no code implementations • 27 Jul 2020 • Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders, where states and actions can act as proxies for the unobserved confounders.
1 code implementation • NeurIPS 2020 • Nathan Kallus, Masatoshi Uehara
Targeting deterministic policies, for which action is a deterministic function of state, is crucial since optimal policies are always deterministic (up to ties).
no code implementations • 6 Jun 2020 • Nathan Kallus, Masatoshi Uehara
Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown.
no code implementations • 6 May 2020 • Nathan Kallus
When this set is permutation symmetric, the optimal design is complete randomization, and using a single partition (i. e., the design that only randomizes the treatment labels for each side of the partition) has minimax risk larger by a factor of $n-1$.
1 code implementation • 6 May 2020 • Yichun Hu, Nathan Kallus
While existing literature mostly focuses on estimating the optimal DTR from offline data such as from sequentially randomized trials, we study the problem of developing the optimal DTR in an online manner, where the interaction with each individual affect both our cumulative reward and our data collection for future learning.
no code implementations • 6 Apr 2020 • Nathan Kallus
I congratulate Profs.
no code implementations • 27 Mar 2020 • Nathan Kallus, Xiaojie Mao
However, there is often an abundance of observations on surrogate outcomes not of primary interest, such as short-term health effects or online-ad click-through.
1 code implementation • ICML 2020 • Andrew Bennett, Nathan Kallus
We show that, under a correct specification assumption, the weighted classification formulation need not be efficient for policy parameters.
no code implementations • NeurIPS 2020 • Nathan Kallus, Angela Zhou
We develop a robust approach that estimates sharp bounds on the (unidentifiable) value of a given policy in an infinite-horizon problem given data from another policy with unobserved confounding, subject to a sensitivity model.
no code implementations • ICML 2020 • Nathan Kallus, Masatoshi Uehara
Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value.
no code implementations • 21 Jan 2020 • Fredrik D. Johansson, Uri Shalit, Nathan Kallus, David Sontag
Practitioners in diverse fields such as healthcare, economics and education are eager to apply machine learning to improve decision making.
1 code implementation • 30 Dec 2019 • Nathan Kallus, Xiaojie Mao, Masatoshi Uehara
A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated.
1 code implementation • NeurIPS 2019 • Nathan Kallus, Angela Zhou
Personalized interventions in social services, education, and healthcare leverage individual-level causal effect predictions in order to give the best treatment to each individual or to prioritize program interventions for the individuals most likely to benefit.
no code implementations • 26 Oct 2019 • Nathan Kallus, Michele Santacatterina
In this paper, we propose Kernel Optimal Orthogonality Weighting (KOOW), a convex optimization-based method, for estimating the effects of continuous treatments.
no code implementations • 12 Sep 2019 • Nathan Kallus, Masatoshi Uehara
This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar.
1 code implementation • 5 Sep 2019 • Yichun Hu, Nathan Kallus, Xiaojie Mao
We study a nonparametric contextual bandit problem where the expected reward functions belong to a H\"older class with smoothness parameter $\beta$.
1 code implementation • 22 Aug 2019 • Nathan Kallus, Masatoshi Uehara
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible.
1 code implementation • 13 Aug 2019 • Nathan Kallus, Michele Santacatterina
In causal inference, a variety of causal effect estimands have been studied, including the sample, uncensored, target, conditional, optimal subpopulation, and optimal weighted average treatment effects.
1 code implementation • NeurIPS 2019 • Andrew Bennett, Nathan Kallus
We study the question of policy evaluation when we instead have proxies for the latent confounders and develop an importance weighting method that avoids fitting a latent outcome regression model.
1 code implementation • 20 Jun 2019 • Nathan Kallus
Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond.
1 code implementation • NeurIPS 2019 • Nathan Kallus, Masatoshi Uehara
We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS.
1 code implementation • 4 Jun 2019 • Nathan Kallus, Angela Zhou
Personalized interventions in social services, education, and healthcare leverage individual-level causal effect predictions in order to give the best treatment to each individual or to prioritize program interventions for the individuals most likely to benefit.
1 code implementation • 1 Jun 2019 • Nathan Kallus, Xiaojie Mao, Angela Zhou
In this paper we study a fundamental challenge to assessing disparate impacts in practice: protected class membership is often not observed in the data.
1 code implementation • 1 Jun 2019 • Vishal Gupta, Nathan Kallus
This intuition further suggests that data-pooling offers the most benefits when there are many problems, each of which has a small amount of relevant data.
2 code implementations • NeurIPS 2019 • Andrew Bennett, Nathan Kallus, Tobias Schnabel
Instrumental variable analysis is a powerful tool for estimating causal effects when randomization or full control of confounders is not possible.
1 code implementation • NeurIPS 2019 • Nathan Kallus, Angela Zhou
To better account for this, in this paper, we investigate the fairness of predictive risk scores from the point of view of a bipartite ranking task, where one seeks to rank positive examples higher than negative ones.
no code implementations • 14 Feb 2019 • Nathan Kallus
In the context of individual-level causal inference, we study the problem of predicting whether someone will respond or not to a treatment based on their features and past examples of features, treatment indicator (e. g., drug/no drug), and a binary outcome (e. g., recovery from disease).
1 code implementation • 27 Nov 2018 • Jiahao Chen, Nathan Kallus, Xiaojie Mao, Geoffry Svacha, Madeleine Udell
We also propose an alternative weighted estimator that uses soft classification, and show that its bias arises simply from the conditional covariance of the outcome with the true class membership.
1 code implementation • 10 Nov 2018 • Nathan Kallus, Brenton Pennicooke, Michele Santacatterina
Inverse probability of treatment weighting (IPTW), which has been used to estimate sample average treatment effects (SATE) using observational data, tenuously relies on the positivity assumption and the correct specification of the treatment assignment model, both of which are problematic assumptions in many observational studies.
Methodology stat.ML, stat.ME, stat.AP
no code implementations • NeurIPS 2018 • Nathan Kallus, Aahlad Manas Puli, Uri Shalit
We introduce a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data.
no code implementations • 5 Oct 2018 • Nathan Kallus, Xiaojie Mao, Angela Zhou
We study the problem of learning conditional average treatment effects (CATE) from observational data with unobserved confounders.
no code implementations • ICML 2018 • Nathan Kallus, Angela Zhou
We connect these lines of work and study the residual unfairness that arises when a fairness-adjusted predictor is not actually fair on the target population due to systematic censoring of training data by existing biased policies.
1 code implementation • 4 Jun 2018 • Nathan Kallus, Michele Santacatterina
Marginal structural models (MSMs) estimate the causal effect of a time-varying treatment in the presence of time-dependent confounding via weighted regression.
1 code implementation • NeurIPS 2018 • Nathan Kallus, Xiaojie Mao, Madeleine Udell
Valid causal inference in observational studies often requires controlling for confounders.
no code implementations • NeurIPS 2018 • Nathan Kallus, Angela Zhou
We study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding.
no code implementations • ICLR 2018 • Fredrik D. Johansson, Nathan Kallus, Uri Shalit, David Sontag
We pose both of these problems as prediction under a shift in design.
no code implementations • 16 Feb 2018 • Nathan Kallus, Angela Zhou
We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments.
no code implementations • ICML 2020 • Nathan Kallus
We study optimal covariate balance for causal inferences from observational data when rich covariates and complex relationships necessitate flexible modeling with neural networks.
no code implementations • 21 May 2017 • Nathan Kallus
We argue for a particular kind of regret that captures the causal effect of treatments but show that standard MAB algorithms cannot achieve sublinear control on this regret.
no code implementations • NeurIPS 2018 • Nathan Kallus
We propose a new, balance-based approach that too makes the data look like the new policy but does so directly by finding weights that optimize for balance between the weighted data and the target policy in the given, finite sample, which is equivalent to minimizing worst-case or posterior conditional mean square error.
no code implementations • 26 Dec 2016 • Nathan Kallus
We develop an encompassing framework for matching, covariate balancing, and doubly-robust methods for causal inference from observational data called generalized optimal matching (GOM).
no code implementations • 18 Oct 2016 • Nathan Kallus, Madeleine Udell
In the dynamic setting, we show that structure-aware dynamic assortment personalization can have regret that is an order of magnitude smaller than structure-ignorant approaches.
no code implementations • ICML 2017 • Nathan Kallus
We study the problem of learning to choose from m discrete treatment options (e. g., news item or medical drug) the one with best causal effect for a particular instance (e. g., user or patient) where the training data consists of passive observations of covariates, treatment, and the outcome of the treatment.
no code implementations • 17 Sep 2015 • Nathan Kallus, Madeleine Udell
In our model, the preferences of each customer or segment follow a separate parametric choice model, but the underlying structure of these parameters over all the models has low dimension.
1 code implementation • 22 Feb 2014 • Dimitris Bertsimas, Nathan Kallus
To demonstrate the power of our approach in a real-world setting we study an inventory management problem faced by the distribution arm of an international media conglomerate, which ships an average of 1bil units per year.