no code implementations • 3 Dec 2024 • Andrew Wagenmaker, Lu Mi, Marton Rozsa, Matthew S. Bull, Karel Svoboda, Kayvon Daie, Matthew D. Golub, Kevin Jamieson
Using neural population responses to photostimulation in mouse motor cortex, we demonstrate the efficacy of a low-rank linear dynamical systems model, and develop an active learning procedure which takes advantage of low-rank structure to determine informative photostimulation patterns.
no code implementations • 27 Oct 2024 • Yifang Chen, David Zhu, Simon Du, Kevin Jamieson, Yang Liu
Recent advances in large language model (LLM) training have highlighted the need for diverse, high-quality instruction data.
no code implementations • 26 Oct 2024 • Andrew Wagenmaker, Kevin Huang, Liyiming Ke, Byron Boots, Kevin Jamieson, Abhishek Gupta
To the best of our knowledge, this is the first evidence that simulation transfer yields a provable gain in reinforcement learning in settings where direct sim2real transfer fails.
no code implementations • 2 Jul 2024 • Yifang Chen, Shuohang Wang, ZiYi Yang, Hiteshi Sharma, Nikos Karampatziakis, Donghan Yu, Kevin Jamieson, Simon Shaolei Du, Yelong Shen
Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}.
1 code implementation • 15 Jun 2024 • Jifan Zhang, Lalit Jain, Yang Guo, Jiayi Chen, Kuan Lok Zhou, Siddharth Suresh, Andrew Wagenmaker, Scott Sievert, Timothy Rogers, Kevin Jamieson, Robert Mankoff, Robert Nowak
We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2. 2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years.
no code implementations • 11 Jun 2024 • Adhyyan Narang, Andrew Wagenmaker, Lillian Ratliff, Kevin Jamieson
In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability.
2 code implementations • 29 May 2024 • Yiping Wang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shaolei Du
Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data selection, (2) training new CLIP-style embedding models that are more effective at selecting high-quality data than the original OpenAI CLIP model, and (3) designing better metrics or strategies universally applicable to any CLIP embedding without requiring specific model properties (e. g., CLIPScore is one popular metric).
2 code implementations • 3 Feb 2024 • Yiping Wang, Yifang Chen, Wendan Yan, Kevin Jamieson, Simon Shaolei Du
In recent years, data selection has emerged as a core issue for large-scale visual-language model pretraining, especially on noisy web-curated datasets.
no code implementations • 12 Jan 2024 • Gantavya Bhatt, Yifang Chen, Arnav M. Das, Jifan Zhang, Sang T. Truong, Stephen Mussmann, Yinglun Zhu, Jeffrey Bilmes, Simon S. Du, Kevin Jamieson, Jordan T. Ash, Robert D. Nowak
To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design.
no code implementations • 13 Dec 2023 • Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain, Kevin Jamieson
In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments, where the cost of collecting labeled data prohibits the use of large, labeled datasets.
no code implementations • 27 Oct 2023 • Artin Tajdini, Lalit Jain, Kevin Jamieson
In this work, we establish the first minimax lower bound for this setting that scales like $\tilde{\Omega}(\min_{L \le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$.
1 code implementation • 25 Oct 2023 • Arnab Maiti, Ross Boczar, Kevin Jamieson, Lillian J. Ratliff
We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors.
no code implementations • 9 Oct 2023 • Zhaoqi Li, Kevin Jamieson, Lalit Jain
In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling?
no code implementations • 23 Sep 2023 • Shuai Li, Azarakhsh Keipour, Kevin Jamieson, Nicolas Hudson, Sicong Zhao, Charles Swan, Kostas Bekris
Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to market fluctuations.
1 code implementation • 27 Jul 2023 • Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson
For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$.
1 code implementation • 22 Jun 2023 • Arnab Maiti, Kevin Jamieson, Lillian J. Ratliff
If the row player uses the EXP3 strategy, an algorithm known for obtaining $\sqrt{T}$ regret against an arbitrary sequence of rewards, it is immediate that the row player also achieves $\sqrt{T}$ regret relative to the Nash equilibrium in this game setting.
1 code implementation • 16 Jun 2023 • Jifan Zhang, Yifang Chen, Gregory Canal, Stephen Mussmann, Arnav M. Das, Gantavya Bhatt, Yinglun Zhu, Jeffrey Bilmes, Simon Shaolei Du, Kevin Jamieson, Robert D Nowak
Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive.
no code implementations • 5 Jun 2023 • Yiping Wang, Yifang Chen, Kevin Jamieson, Simon S. Du
In addition to our sample complexity results, we also characterize the potential of our $\nu^1$-based strategy in sample-cost-sensitive settings.
no code implementations • 17 May 2023 • Shuai Li, Azarakhsh Keipour, Kevin Jamieson, Nicolas Hudson, Charles Swan, Kostas Bekris
This paper demonstrates a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which utilizes a pick success predictor trained on real production data.
no code implementations • 19 Mar 2023 • Arnab Maiti, Kevin Jamieson, Lillian J. Ratliff
We study the sample complexity of identifying an approximate equilibrium for two-player zero-sum $n\times 2$ matrix games.
no code implementations • 6 Jul 2022 • Andrew Wagenmaker, Kevin Jamieson
While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning.
no code implementations • 5 Jul 2022 • Zhaoqi Li, Lillian Ratliff, Houssam Nassif, Kevin Jamieson, Lalit Jain
In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied.
no code implementations • 22 Jun 2022 • Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain, Kevin Jamieson
To our knowledge, our results are the first on best-arm identification in linear bandits with safety constraints.
no code implementations • 2 Feb 2022 • Yifang Chen, Simon S. Du, Kevin Jamieson
To leverage the power of big data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications.
no code implementations • 26 Jan 2022 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson
We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\widetilde{\mathcal{O}}(d^2 H^5/\epsilon^2)$.
no code implementations • 7 Dec 2021 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making.
1 code implementation • 23 Nov 2021 • Zhenlin Wang, Andrew Wagenmaker, Kevin Jamieson
The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning.
no code implementations • NeurIPS 2021 • Julian Katz-Samuels, Blake Mason, Kevin Jamieson, Rob Nowak
We begin our investigation with the observation that agnostic algorithms \emph{cannot} be minimax-optimal in the realizable setting.
no code implementations • 2 Nov 2021 • Blake Mason, Romain Camilleri, Subhojyoti Mukherjee, Kevin Jamieson, Robert Nowak, Lalit Jain
The threshold value $\alpha$ can either be \emph{explicit} and provided a priori, or \emph{implicit} and defined relative to the optimal function value, i. e. $\alpha = (1-\epsilon)f(x_\ast)$ for a given $\epsilon > 0$ where $f(x_\ast)$ is the maximal function value and is unknown.
no code implementations • NeurIPS 2021 • Romain Camilleri, Zhihan Xiong, Maryam Fazel, Lalit Jain, Kevin Jamieson
The main results of this work precisely characterize this trade-off between labeled samples and stopping time and provide an algorithm that nearly-optimally achieves the minimal label complexity given a desired stopping time.
no code implementations • 5 Aug 2021 • Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
We show this is not possible -- there exists a fundamental tradeoff between achieving low regret and identifying an $\epsilon$-optimal policy at the instance-optimal rate.
no code implementations • NeurIPS 2021 • Yifang Chen, Simon S. Du, Kevin Jamieson
We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions.
no code implementations • 13 May 2021 • Julian Katz-Samuels, Jifan Zhang, Lalit Jain, Kevin Jamieson
We consider active learning for binary classification in the agnostic pool-based setting.
no code implementations • 12 May 2021 • Romain Camilleri, Julian Katz-Samuels, Kevin Jamieson
We also leverage our new approach in a new algorithm for kernelized bandits to obtain state of the art results for regret minimization and pure exploration.
no code implementations • 13 Feb 2021 • Yifang Chen, Simon S. Du, Kevin Jamieson
We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
no code implementations • 10 Feb 2021 • Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal.
no code implementations • 5 Nov 2020 • Ethan K. Gordon, Sumegh Roychowdhury, Tapomayukh Bhattacharjee, Kevin Jamieson, Siddhartha S. Srinivasa
Our key insight is that we can leverage the haptic context we collect during and after manipulation (i. e., "post hoc") to learn some of these properties and more quickly adapt our visual model to previously unseen food.
no code implementations • 1 Nov 2020 • Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson
In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits.
no code implementations • 29 Oct 2020 • Jifan Zhang, Lalit Jain, Kevin Jamieson
Unlike the design of traditional adaptive algorithms that rely on concentration of measure and careful analysis to justify the correctness and sample complexity of the procedure, our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds.
no code implementations • NeurIPS 2019 • Lalit Jain, Kevin Jamieson
In many scientific settings there is a need for adaptive experimental design to guide the process of identifying regions of the search space that contain as many true positives as possible subject to a low rate of false discoveries (i. e. false alarms).
no code implementations • NeurIPS 2020 • Julian Katz-Samuels, Lalit Jain, Zohar Karnin, Kevin Jamieson
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
1 code implementation • ICML 2020 • Jennifer Brennan, Ramya Korlakai Vinayak, Kevin Jamieson
We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting.
no code implementations • 2 Feb 2020 • Andrew Wagenmaker, Kevin Jamieson
We propose an algorithm to actively estimate the parameters of a linear dynamical system.
no code implementations • 17 Dec 2019 • Laurel Orr, Samuel Ainsworth, Walter Cai, Kevin Jamieson, Magda Balazinska, Dan Suciu
Recently, with the increase in the number of public data repositories, sample data has become easier to access.
1 code implementation • NeurIPS 2019 • Tanner Fiez, Lalit Jain, Kevin Jamieson, Lillian Ratliff
Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost.
no code implementations • 15 Jun 2019 • Julian Katz-Samuels, Kevin Jamieson
We consider two multi-armed bandit problems with $n$ arms: (i) given an $\epsilon > 0$, identify an arm with mean that is within $\epsilon$ of the largest mean and (ii) given a threshold $\mu_0$ and integer $k$, identify $k$ arms with means larger than $\mu_0$.
no code implementations • NeurIPS 2019 • Max Simchowitz, Kevin Jamieson
This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 12 Mar 2019 • Liam Li, Evan Sparks, Kevin Jamieson, Ameet Talwalkar
Hyperparameter tuning of multi-stage pipelines introduces a significant computational burden.
no code implementations • 15 Nov 2018 • Maryam Aziz, Kevin Jamieson, Javed Aslam
This paper considers a multi-armed bandit game where the number of arms is much larger than the maximum budget and is effectively infinite.
5 code implementations • ICLR 2018 • Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, Ameet Talwalkar
Modern learning models are characterized by large hyperparameter spaces and long training times.
no code implementations • 6 Sep 2018 • Kevin Jamieson, Lalit Jain
We propose an adaptive sampling approach for multiple testing which aims to maximize statistical power while ensuring anytime false discovery control.
no code implementations • 14 Aug 2018 • Max Simchowitz, Kevin Jamieson, Jordan W. Suchow, Thomas L. Griffiths
In this paper, we introduce the first principled adaptive-sampling procedure for learning a convex function in the $L_\infty$ norm, a problem that arises often in the behavioral and social sciences.
no code implementations • ICML 2018 • Lalit Jain, Kevin Jamieson
In this paper, we model the problem of optimizing crowdfunding platforms, such as the non-profit Kiva or for-profit KickStarter, as a variant of the multi-armed bandit problem.
no code implementations • ICLR 2018 • Lisha Li, Kevin Jamieson, Afshin Rostamizadeh, Katya Gonina, Moritz Hardt, Benjamin Recht, Ameet Talwalkar
Modern machine learning models are characterized by large hyperparameter search spaces and prohibitively expensive training costs.
1 code implementation • NeurIPS 2017 • Fanny Yang, Aaditya Ramdas, Kevin Jamieson, Martin J. Wainwright
We propose an alternative framework to existing setups for controlling false alarms when multiple A/B tests are run over time.
no code implementations • ICLR 2018 • Jesse Dodge, Kevin Jamieson, Noah A. Smith
Driven by the need for parallelizable hyperparameter optimization methods, this paper studies \emph{open loop} search methods: sequences that are predetermined and can be generated before a single configuration is evaluated.
no code implementations • 16 Feb 2017 • Max Simchowitz, Kevin Jamieson, Benjamin Recht
Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity.
no code implementations • 4 Oct 2016 • Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg
Although policies learned with RC sampling can be superior to HC sampling for standard learning models such as linear SVMs, policies learned with HC sampling may be comparable with highly-expressive learning models such as deep learning and hyper-parametric decision trees, which have little model error.
no code implementations • NeurIPS 2016 • Lalit Jain, Kevin Jamieson, Robert Nowak
First, we derive prediction error bounds for ordinal embedding with noise by exploiting the fact that the rank of a distance matrix of points in $\mathbb{R}^d$ is at most $d+2$.
no code implementations • 25 Mar 2016 • Kevin Jamieson, Daniel Haas, Ben Recht
As a result, these bounds have surprising implications both for solutions to the most biased coin problem and for anomaly detection when only partial information about the parameters is known.
18 code implementations • 21 Mar 2016 • Lisha Li, Kevin Jamieson, Giulia Desalvo, Afshin Rostamizadeh, Ameet Talwalkar
Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters.
no code implementations • 9 Mar 2016 • Max Simchowitz, Kevin Jamieson, Benjamin Recht
This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution.
1 code implementation • 27 Feb 2015 • Kevin Jamieson, Ameet Talwalkar
Motivated by the task of hyperparameter optimization, we introduce the non-stochastic best-arm identification problem.
no code implementations • 31 Jan 2015 • Kevin Jamieson, Sumeet Katariya, Atul Deshpande, Robert Nowak
We prove that in the absence of structural assumptions, the sample complexity of this problem is proportional to the sum of the inverse squared gaps between the Borda scores of each suboptimal arm and the best arm.
no code implementations • 27 Dec 2013 • Kevin Jamieson, Matthew Malloy, Robert Nowak, Sébastien Bubeck
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples.
no code implementations • 17 Jun 2013 • Kevin Jamieson, Matthew Malloy, Robert Nowak, Sebastien Bubeck
Motivated by large-scale applications, we are especially interested in identifying situations where the total number of samples that are necessary and sufficient to find the best arm scale linearly with the number of arms.