no code implementations • 28 Feb 2023 • Paul Mineiro, Steven R. Howard
Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making.
no code implementations • 17 Feb 2023 • Paul Mineiro
When feedback is partial, leveraging all available information is critical to minimizing data requirements.
no code implementations • 16 Feb 2023 • Mark Rucker, Yinglun Zhu, Paul Mineiro
For infinite action contextual bandits, smoothed regret and reduction to regression results in state-of-the-art online statistical performance with computational cost independent of the action set: unfortunately, the resulting data exhaust does not have well-defined importance-weights.
1 code implementation • 28 Nov 2022 • Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan
In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions.
1 code implementation • 14 Nov 2022 • Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman
Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks.
1 code implementation • 25 Oct 2022 • Mark Rucker, Jordan T. Ash, John Langford, Paul Mineiro, Ida Momennejad
This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios.
no code implementations • 24 Oct 2022 • Mónika Farsang, Paul Mineiro, Wangda Zhang
We desire to apply contextual bandits to scenarios where average-case statistical guarantees are inadequate.
no code implementations • 24 Oct 2022 • Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal
Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios.
1 code implementation • 20 Oct 2022 • Paul Mineiro
In certain cases this lower CS can be converted into a closed-interval CS whose width converges to zero, e. g., any bounded realization, or post contextual-bandit inference with bounded rewards and unbounded importance weights.
1 code implementation • 19 Oct 2022 • Ian Waudby-Smith, Lili Wu, Aaditya Ramdas, Nikos Karampatziakis, Paul Mineiro
Importantly, our methods can be employed while the original experiment is still running (that is, not necessarily post-hoc), when the logging policy may be itself changing (due to learning), and even if the context distributions are a highly dependent time-series (such as if they are drifting over time).
1 code implementation • 12 Jul 2022 • Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro
Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.
1 code implementation • 12 Jul 2022 • Yinglun Zhu, Paul Mineiro
Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control.
no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.
no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.
1 code implementation • 9 Jun 2021 • Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi
We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings.
no code implementations • 9 Jun 2021 • Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad
We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.
no code implementations • 1 Jun 2021 • Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan
Targeting immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric.
no code implementations • 18 Feb 2021 • Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas
We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting.
1 code implementation • NeurIPS 2020 • Nikos Karampatziakis, John Langford, Paul Mineiro
We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting.
no code implementations • 6 May 2019 • Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen
In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support.
no code implementations • 17 Jul 2018 • Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro
We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.
no code implementations • ICML 2017 • Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro
Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an algorithm.
no code implementations • 5 Feb 2016 • He He, Paul Mineiro, Nikos Karampatziakis
We propose a general framework for sequential and dynamic acquisition of useful information in order to solve a particular task.
General Reinforcement Learning
Reinforcement Learning (RL)
+1
no code implementations • 10 Nov 2015 • Paul Mineiro, Nikos Karampatziakis
Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable.
no code implementations • 30 Mar 2015 • Paul Mineiro, Nikos Karampatziakis
Many modern multiclass and multilabel problems are characterized by increasingly large output spaces.
no code implementations • 9 Feb 2015 • Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro
We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.
1 code implementation • 9 Feb 2015 • Nikos Karampatziakis, Paul Mineiro
In this work we show that a generic regularized nonlinearity mapping independent predictions to joint predictions is sufficient to achieve state-of-the-art performance on a variety of benchmark problems.
no code implementations • 19 Dec 2014 • Paul Mineiro, Nikos Karampatziakis
Many modern multiclass and multilabel problems are characterized by increasingly large output spaces.
no code implementations • 13 Nov 2014 • Paul Mineiro, Nikos Karampatziakis
We present RandomizedCCA, a randomized algorithm for computing canonical analysis, suitable for large datasets stored either out of core or on a distributed file system.
no code implementations • 9 Aug 2014 • Stephane Ross, Paul Mineiro, John Langford
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.
no code implementations • 23 Oct 2013 • Nikos Karampatziakis, Paul Mineiro
Principal Component Analysis (PCA) is a ubiquitous tool with many applications in machine learning including feature construction, subspace embedding, and outlier detection.
no code implementations • 7 Oct 2013 • Nikos Karampatziakis, Paul Mineiro
Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system.
no code implementations • 7 Jun 2013 • Paul Mineiro, Nikos Karampatziakis
We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk.
1 code implementation • 28 May 2013 • Stephane Ross, Paul Mineiro, John Langford
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.