Search Results for author: Paul Mineiro

Found 34 papers, 13 papers with code

Aligning LLM Agents by Learning Latent Preference from User Edits

no code implementations • 23 Apr 2024 • Ge Gao, Alexey Taymanov, Eduardo Salinas, Paul Mineiro, Dipendra Misra

In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness.

Paper
Add Code

Efficient Contextual Bandits with Uninformed Feedback Graphs

no code implementations • 12 Feb 2024 • Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications.

Multi-Armed Bandits regression

Paper
Add Code

Infinite Action Contextual Bandits with Reusable Data Exhaust

1 code implementation • 16 Feb 2023 • Mark Rucker, Yinglun Zhu, Paul Mineiro

For infinite action contextual bandits, smoothed regret and reduction to regression results in state-of-the-art online performance with computational cost independent of the action set: unfortunately, the resulting data exhaust does not have well-defined importance-weights.

Model Selection Multi-Armed Bandits +1

Paper
Code

Personalized Reward Learning with Interaction-Grounded Learning (IGL)

1 code implementation • 28 Nov 2022 • Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan

In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions.

Recommendation Systems

Paper
Code

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

1 code implementation • 14 Nov 2022 • Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman

Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks.

Decision Making reinforcement-learning +1

Paper
Code

Eigen Memory Trees

1 code implementation • 25 Oct 2022 • Mark Rucker, Jordan T. Ash, John Langford, Paul Mineiro, Ida Momennejad

This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios.

Paper
Code

Deploying a Steered Query Optimizer in Production at Microsoft

no code implementations • 24 Oct 2022 • Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal

Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios.

Paper
Add Code

Conditionally Risk-Averse Contextual Bandits

1 code implementation • 24 Oct 2022 • Mónika Farsang, Paul Mineiro, Wangda Zhang

Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance.

Management Multi-Armed Bandits +1

Paper
Code

A lower confidence sequence for the changing mean of non-negative right heavy-tailed observations with bounded mean

1 code implementation • 20 Oct 2022 • Paul Mineiro

In certain cases this lower CS can be converted into a closed-interval CS whose width converges to zero, e. g., any bounded realization, or post contextual-bandit inference with bounded rewards and unbounded importance weights.

valid

Paper
Code

Anytime-valid off-policy inference for contextual bandits

1 code implementation • 19 Oct 2022 • Ian Waudby-Smith, Lili Wu, Aaditya Ramdas, Nikos Karampatziakis, Paul Mineiro

Importantly, our methods can be employed while the original experiment is still running (that is, not necessarily post-hoc), when the logging policy may be itself changing (due to learning), and even if the context distributions are a highly dependent time-series (such as if they are drifting over time).

counterfactual Multi-Armed Bandits +3

Paper
Code

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

1 code implementation • 12 Jul 2022 • Yinglun Zhu, Paul Mineiro

Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control.

Continuous Control Information Retrieval +3

Paper
Code

Contextual Bandits with Large Action Spaces: Made Practical

1 code implementation • 12 Jul 2022 • Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.

Decision Making Multi-Armed Bandits

Paper
Code

Interaction-Grounded Learning with Action-inclusive Feedback

no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.

Brain Computer Interface

Paper
Add Code

Bellman-consistent Pessimism for Offline Reinforcement Learning

no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Interaction-Grounded Learning

no code implementations • 9 Jun 2021 • Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.

Paper
Add Code

ChaCha for Online AutoML

1 code implementation • 9 Jun 2021 • Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi

We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings.

AutoML Scheduling

3,675

Paper
Code

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Reinforcement Learning

no code implementations • 1 Jun 2021 • Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan

Targeting immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric.

Offline RL reinforcement-learning +2

Paper
Add Code

Off-policy Confidence Sequences

no code implementations • 18 Feb 2021 • Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas

We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting.

Off-policy evaluation valid

Paper
Add Code

Empirical Likelihood for Contextual Bandits

1 code implementation • NeurIPS 2020 • Nikos Karampatziakis, John Langford, Paul Mineiro

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting.

Multi-Armed Bandits

Paper
Code

Lessons from Contextual Bandit Learning in a Customer Support Bot

no code implementations • 6 May 2019 • Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen

In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support.

Information Retrieval Multi-Armed Bandits +2

Paper
Add Code

Contextual Memory Trees

no code implementations • 17 Jul 2018 • Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.

General Classification Image Captioning +2

Paper
Add Code

Logarithmic Time One-Against-Some

no code implementations • ICML 2017 • Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro

Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an algorithm.

Binary Classification Classification +1

Paper
Add Code

Active Information Acquisition

no code implementations • 5 Feb 2016 • He He, Paul Mineiro, Nikos Karampatziakis

We propose a general framework for sequential and dynamic acquisition of useful information in order to solve a particular task.

General Reinforcement Learning Reinforcement Learning (RL) +1

Paper
Add Code

A Hierarchical Spectral Method for Extreme Classification

no code implementations • 10 Nov 2015 • Paul Mineiro, Nikos Karampatziakis

Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable.

Classification General Classification +1

Paper
Add Code

Fast Label Embeddings for Extremely Large Output Spaces

no code implementations • 30 Mar 2015 • Paul Mineiro, Nikos Karampatziakis

Many modern multiclass and multilabel problems are characterized by increasingly large output spaces.

Paper
Add Code

Learning Reductions that Really Work

no code implementations • 9 Feb 2015 • Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.

BIG-bench Machine Learning

Paper
Add Code

Scalable Multilabel Prediction via Randomized Methods

1 code implementation • 9 Feb 2015 • Nikos Karampatziakis, Paul Mineiro

In this work we show that a generic regularized nonlinearity mapping independent predictions to joint predictions is sufficient to achieve state-of-the-art performance on a variety of benchmark problems.

General Classification

Paper
Code

Fast Label Embeddings via Randomized Linear Algebra

no code implementations • 19 Dec 2014 • Paul Mineiro, Nikos Karampatziakis

Many modern multiclass and multilabel problems are characterized by increasingly large output spaces.

Paper
Add Code

A Randomized Algorithm for CCA

no code implementations • 13 Nov 2014 • Paul Mineiro, Nikos Karampatziakis

We present RandomizedCCA, a randomized algorithm for computing canonical analysis, suitable for large datasets stored either out of core or on a distributed file system.

Paper
Add Code

Normalized Online Learning

no code implementations • 9 Aug 2014 • Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

Paper
Add Code

Combining Structured and Unstructured Randomness in Large Scale PCA

no code implementations • 23 Oct 2013 • Nikos Karampatziakis, Paul Mineiro

Principal Component Analysis (PCA) is a ubiquitous tool with many applications in machine learning including feature construction, subspace embedding, and outlier detection.

BIG-bench Machine Learning Outlier Detection

Paper
Add Code

Discriminative Features via Generalized Eigenvectors

no code implementations • 7 Oct 2013 • Nikos Karampatziakis, Paul Mineiro

Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system.

General Classification

Paper
Add Code

Loss-Proportional Subsampling for Subsequent ERM

no code implementations • 7 Jun 2013 • Paul Mineiro, Nikos Karampatziakis

We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk.

Paper
Add Code

Normalized Online Learning

1 code implementation • 28 May 2013 • Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

817

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.