Search Results for author: Teodor V. Marinov

Found 12 papers, 1 papers with code

Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization

no code implementations28 Mar 2024 Teodor V. Marinov, Alekh Agarwal, Mircea Trofin

This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies.

Compiler Optimization Imitation Learning +1

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

no code implementations26 May 2023 Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth

We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization.

In-Context Learning Retrieval

Leveraging User-Triggered Supervision in Contextual Bandits

no code implementations7 Feb 2023 Alekh Agarwal, Claudio Gentile, Teodor V. Marinov

We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context.

Multi-Armed Bandits

Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality

no code implementations20 Jun 2022 Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

We revisit the problem of stochastic online learning with feedback graphs, with the goal of devising algorithms that are optimal, up to constants, both asymptotically and in finite time.

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

no code implementations NeurIPS 2021 Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Corralling Stochastic Bandit Algorithms

no code implementations16 Jun 2020 Raman Arora, Teodor V. Marinov, Mehryar Mohri

We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best base algorithm.

Bandits with Feedback Graphs and Switching Costs

no code implementations NeurIPS 2019 Raman Arora, Teodor V. Marinov, Mehryar Mohri

We give a new algorithm whose regret guarantee depends only on the domination number of the graph.

counterfactual

Policy Regret in Repeated Games

no code implementations NeurIPS 2018 Raman Arora, Michael Dinitz, Teodor V. Marinov, Mehryar Mohri

We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other.

Streaming Kernel PCA with $\tilde{O}(\sqrt{n})$ Random Features

1 code implementation2 Aug 2018 Enayat Ullah, Poorya Mianjy, Teodor V. Marinov, Raman Arora

We study the statistical and computational aspects of kernel principal component analysis using random Fourier features and show that under mild assumptions, $O(\sqrt{n} \log n)$ features suffices to achieve $O(1/\epsilon^2)$ sample complexity.

Stochastic Approximation for Canonical Correlation Analysis

no code implementations NeurIPS 2017 Raman Arora, Teodor V. Marinov, Poorya Mianjy, Nathan Srebro

We propose novel first-order stochastic approximation algorithms for canonical correlation analysis (CCA).

Cannot find the paper you are looking for? You can Submit a new open access paper.