Search Results for author: Paul Mineiro

Found 34 papers, 14 papers with code

Aligning LLM Agents by Learning Latent Preference from User Edits

1 code implementation23 Apr 2024 Ge Gao, Alexey Taymanov, Eduardo Salinas, Paul Mineiro, Dipendra Misra

In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness.

Descriptive Language Modelling +2

Efficient Contextual Bandits with Uninformed Feedback Graphs

no code implementations12 Feb 2024 Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications.

Multi-Armed Bandits regression

Infinite Action Contextual Bandits with Reusable Data Exhaust

1 code implementation16 Feb 2023 Mark Rucker, Yinglun Zhu, Paul Mineiro

For infinite action contextual bandits, smoothed regret and reduction to regression results in state-of-the-art online performance with computational cost independent of the action set: unfortunately, the resulting data exhaust does not have well-defined importance-weights.

Model Selection Multi-Armed Bandits +1

Personalized Reward Learning with Interaction-Grounded Learning (IGL)

1 code implementation28 Nov 2022 Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan

In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions.

Recommendation Systems

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

1 code implementation14 Nov 2022 Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman

Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks.

Decision Making reinforcement-learning +1

Eigen Memory Trees

1 code implementation25 Oct 2022 Mark Rucker, Jordan T. Ash, John Langford, Paul Mineiro, Ida Momennejad

This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios.

Deploying a Steered Query Optimizer in Production at Microsoft

no code implementations24 Oct 2022 Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal

Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios.

Conditionally Risk-Averse Contextual Bandits

1 code implementation24 Oct 2022 Mónika Farsang, Paul Mineiro, Wangda Zhang

Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance.

Management Multi-Armed Bandits +1

A lower confidence sequence for the changing mean of non-negative right heavy-tailed observations with bounded mean

1 code implementation20 Oct 2022 Paul Mineiro

In certain cases this lower CS can be converted into a closed-interval CS whose width converges to zero, e. g., any bounded realization, or post contextual-bandit inference with bounded rewards and unbounded importance weights.


Anytime-valid off-policy inference for contextual bandits

1 code implementation19 Oct 2022 Ian Waudby-Smith, Lili Wu, Aaditya Ramdas, Nikos Karampatziakis, Paul Mineiro

Importantly, our methods can be employed while the original experiment is still running (that is, not necessarily post-hoc), when the logging policy may be itself changing (due to learning), and even if the context distributions are a highly dependent time-series (such as if they are drifting over time).

counterfactual Multi-Armed Bandits +3

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

1 code implementation12 Jul 2022 Yinglun Zhu, Paul Mineiro

Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control.

Continuous Control Information Retrieval +3

Contextual Bandits with Large Action Spaces: Made Practical

1 code implementation12 Jul 2022 Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.

Decision Making Multi-Armed Bandits

Interaction-Grounded Learning with Action-inclusive Feedback

no code implementations16 Jun 2022 Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.

Brain Computer Interface

Bellman-consistent Pessimism for Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

ChaCha for Online AutoML

1 code implementation9 Jun 2021 Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi

We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings.

AutoML Scheduling

Interaction-Grounded Learning

no code implementations9 Jun 2021 Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.

Off-policy Confidence Sequences

no code implementations18 Feb 2021 Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas

We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting.

Off-policy evaluation valid

Empirical Likelihood for Contextual Bandits

1 code implementation NeurIPS 2020 Nikos Karampatziakis, John Langford, Paul Mineiro

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting.

Multi-Armed Bandits

Lessons from Contextual Bandit Learning in a Customer Support Bot

no code implementations6 May 2019 Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen

In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support.

Information Retrieval Multi-Armed Bandits +2

Contextual Memory Trees

no code implementations17 Jul 2018 Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.

General Classification Image Captioning +2

Logarithmic Time One-Against-Some

no code implementations ICML 2017 Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro

Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an algorithm.

Binary Classification Classification +1

Active Information Acquisition

no code implementations5 Feb 2016 He He, Paul Mineiro, Nikos Karampatziakis

We propose a general framework for sequential and dynamic acquisition of useful information in order to solve a particular task.

General Reinforcement Learning Reinforcement Learning (RL) +1

A Hierarchical Spectral Method for Extreme Classification

no code implementations10 Nov 2015 Paul Mineiro, Nikos Karampatziakis

Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable.

Classification General Classification +1

Fast Label Embeddings for Extremely Large Output Spaces

no code implementations30 Mar 2015 Paul Mineiro, Nikos Karampatziakis

Many modern multiclass and multilabel problems are characterized by increasingly large output spaces.

Scalable Multilabel Prediction via Randomized Methods

1 code implementation9 Feb 2015 Nikos Karampatziakis, Paul Mineiro

In this work we show that a generic regularized nonlinearity mapping independent predictions to joint predictions is sufficient to achieve state-of-the-art performance on a variety of benchmark problems.

General Classification

Learning Reductions that Really Work

no code implementations9 Feb 2015 Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.

BIG-bench Machine Learning

Fast Label Embeddings via Randomized Linear Algebra

no code implementations19 Dec 2014 Paul Mineiro, Nikos Karampatziakis

Many modern multiclass and multilabel problems are characterized by increasingly large output spaces.

A Randomized Algorithm for CCA

no code implementations13 Nov 2014 Paul Mineiro, Nikos Karampatziakis

We present RandomizedCCA, a randomized algorithm for computing canonical analysis, suitable for large datasets stored either out of core or on a distributed file system.

Normalized Online Learning

no code implementations9 Aug 2014 Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

Combining Structured and Unstructured Randomness in Large Scale PCA

no code implementations23 Oct 2013 Nikos Karampatziakis, Paul Mineiro

Principal Component Analysis (PCA) is a ubiquitous tool with many applications in machine learning including feature construction, subspace embedding, and outlier detection.

BIG-bench Machine Learning Outlier Detection

Discriminative Features via Generalized Eigenvectors

no code implementations7 Oct 2013 Nikos Karampatziakis, Paul Mineiro

Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system.

General Classification

Loss-Proportional Subsampling for Subsequent ERM

no code implementations7 Jun 2013 Paul Mineiro, Nikos Karampatziakis

We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk.

Normalized Online Learning

1 code implementation28 May 2013 Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

Cannot find the paper you are looking for? You can Submit a new open access paper.