You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 11 Sep 2023 • Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1. 3 billion parameter model with Python coding performance close to the state-of-the-art.

Ranked #1 on Question Answering on SIQA

no code implementations • 20 Jun 2023 • Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

Despite this small scale, phi-1 attains pass@1 accuracy 50. 6% on HumanEval and 55. 5% on MBPP.

Ranked #10 on Code Generation on HumanEval

1 code implementation • 22 Mar 2023 • Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Ranked #10 on Math Word Problem Solving on MATH

no code implementations • 14 Dec 2022 • Kwangjun Ahn, Sébastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe Suarez, Yi Zhang

For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i. e., neurons with a non-zero first-layer bias).

no code implementations • 17 Nov 2022 • Ananya Kumar, Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar

SGD (with momentum) and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision.

1 code implementation • 9 Jun 2022 • Yi Zhang, Arturs Backurs, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Tal Wagner

We study how the trained models eventually succeed at the task, and in particular, we manage to understand some of the attention heads as well as how the information flows in the network.

no code implementations • 3 Mar 2022 • Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar

In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process.

no code implementations • NeurIPS 2021 • Peter L. Bartlett, Sébastien Bubeck, Yeshwanth Cherapanamjeri

We consider the phenomenon of adversarial examples in ReLU networks with independent gaussian parameters.

no code implementations • NeurIPS 2021 • Sébastien Bubeck, Mark Sellke

Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied.

no code implementations • NeurIPS 2021 • Sébastien Bubeck, Yeshwanth Cherapanamjeri, Gauthier Gidel, Rémi Tachet des Combes

Daniely and Schacham recently showed that gradient descent finds adversarial examples on random undercomplete two-layers ReLU neural networks.

no code implementations • 8 Nov 2020 • Sébastien Bubeck, Thomas Budzinski, Mark Sellke

We consider the cooperative multi-player version of the stochastic multi-armed bandit problem.

no code implementations • 30 Sep 2020 • Sébastien Bubeck, Yuanzhi Li, Dheeraj Nagaraj

We make a precise conjecture that, for any Lipschitz activation function and for most datasets, any two-layers neural network with $k$ neurons that perfectly fit the data must have its Lipschitz constant larger (up to a constant) than $\sqrt{n/k}$ where $n$ is the number of datapoints.

no code implementations • 4 Jun 2020 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

no code implementations • 15 Apr 2020 • Sébastien Bubeck, Yuval Rabani, Mark Sellke

We introduce the problem of $k$-chasing of convex functions, a simultaneous generalization of both the famous k-server problem in $R^d$, and of the problem of chasing convex bodies and functions.

1 code implementation • ICML 2020 • Andrey Kolobov, Sébastien Bubeck, Julian Zimmert

Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated.

no code implementations • 14 Feb 2020 • Sébastien Bubeck, Thomas Budzinski

We consider two agents playing simultaneously the same stochastic three-armed bandit problem.

no code implementations • 9 Jan 2020 • Sébastien Bubeck, Dan Mikulincer

This viewpoint was explored in 1993 by Vavasis, who proposed an algorithm which, for any fixed finite dimension $d$, improves upon the $O(1/\varepsilon^2)$ oracle complexity of gradient descent.

no code implementations • NeurIPS 2019 • Sébastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, Aaron Sidford

Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel.

no code implementations • 28 Apr 2019 • Sébastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem.

no code implementations • 2 Feb 2019 • Sébastien Bubeck, Mark Sellke

Second we replace the entropy over combinatorial actions by a coordinate entropy, which allows us to obtain the first optimal worst-case bound for Thompson Sampling in the combinatorial setting.

no code implementations • 29 Jan 2019 • Sébastien Bubeck, Yuanzhi Li, Haipeng Luo, Chen-Yu Wei

We study adaptive regret bounds in terms of the variation of the losses (the so-called path-length bounds) for both multi-armed bandit and more generally linear bandit.

no code implementations • 15 Nov 2018 • Sébastien Bubeck, Yin Tat Lee, Eric Price, Ilya Razenshteyn

In our recent work (Bubeck, Price, Razenshteyn, arXiv:1805. 10204) we argued that adversarial examples in machine learning might be due to an inherent computational hardness of the problem.

no code implementations • NeurIPS 2018 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

Optimization and Control

no code implementations • 25 May 2018 • Sébastien Bubeck, Eric Price, Ilya Razenshteyn

First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples.

no code implementations • ICML 2018 • Zeyuan Allen-Zhu, Sébastien Bubeck, Yuanzhi Li

Regret bounds in online learning compare the player's performance to $L^*$, the optimal performance in hindsight with a fixed strategy.

no code implementations • 3 Nov 2017 • Sébastien Bubeck, Michael B. Cohen, Yuanzhi Li

In (online) learning theory the concepts of sparsity, variance and curvature are well-understood and are routinely used to obtain refined regret and generalization bounds.

no code implementations • 26 May 2017 • Sébastien Bubeck, Nikhil R. Devanur, Zhiyi Huang, Rad Niazadeh

For the online posted pricing problem, we show regret bounds that scale with the best fixed price, rather than the range of the values.

1 code implementation • ICML 2017 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié

For centralized (i. e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp.

no code implementations • 11 Jul 2016 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem.

no code implementations • 15 Feb 2016 • Sébastien Bubeck, Yin-Tat Lee

We propose a new framework for black-box convex optimization which is well-suited for situations where gradient computations are expensive.

no code implementations • 23 Jul 2015 • Sébastien Bubeck, Ronen Eldan

We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function.

no code implementations • 9 Jul 2015 • Sébastien Bubeck, Ronen Eldan, Joseph Lehec

We extend the Langevin Monte Carlo (LMC) algorithm to compactly supported measures via a projection step, akin to projected Stochastic Gradient Descent (SGD).

no code implementations • 26 Jun 2015 • Sébastien Bubeck, Yin Tat Lee, Mohit Singh

The new algorithm has a simple geometric interpretation, loosely inspired by the ellipsoid method.

no code implementations • 23 Feb 2015 • Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres

We analyze the minimax regret of the adversarial bandit convex optimization problem.

no code implementations • 4 Dec 2014 • Sébastien Bubeck, Ronen Eldan

We prove that the Cram\'er transform of the uniform measure on a convex body in $\mathbb{R}^n$ is a $(1+o(1)) n$-self-concordant barrier, improving a seminal result of Nesterov and Nemirovski.

3 code implementations • 20 May 2014 • Sébastien Bubeck

In stochastic optimization we discuss stochastic gradient descent, mini-batches, random coordinate descent, and sublinear algorithms.

no code implementations • 23 Apr 2014 • Che-Yu Liu, Sébastien Bubeck

We study the problem of finding the most mutually correlated arms among many arms.

no code implementations • 27 Dec 2013 • Kevin Jamieson, Matthew Malloy, Robert Nowak, Sébastien Bubeck

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples.

no code implementations • NeurIPS 2013 • Sébastien Bubeck, Che-Yu Liu

Building on the techniques of Audibert and Bubeck [2009] and Russo and Roy [2013] we first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by $14 \sqrt{n K}$.

no code implementations • 25 Apr 2012 • Sébastien Bubeck, Nicolò Cesa-Bianchi

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off.

1 code implementation • 20 Apr 2012 • Jean-Yves Audibert, Sébastien Bubeck, Gábor Lugosi

We also recover the optimal bounds for the full information setting.

no code implementations • NeurIPS 2011 • Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck

We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i. e., small gap).

no code implementations • NeurIPS 2008 • Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.