no code implementations • 14 Dec 2023 • Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang
Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B.
Ranked #63 on
Arithmetic Reasoning
on GSM8K
no code implementations • 17 Nov 2022 • Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar
SGD and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision.
1 code implementation • 14 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah, Sebastien Bubeck, Jianfeng Gao
Furthermore, existing MoE works do not consider computational constraints (e. g., FLOPs, latency) to guide their design.
1 code implementation • 4 Mar 2022 • Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz Koushanfar, Debadeepta Dey
Results show that the perplexity of 16-layer GPT-2 and Transformer-XL can be achieved with up to 1. 5x, 2. 5x faster runtime and 1. 2x, 2. 0x lower peak memory utilization.
no code implementations • 29 Sep 2021 • Debadeepta Dey, Shital Shah, Sebastien Bubeck
We propose a simple but powerful method which we call FEAR, for ranking architectures in any search space.
1 code implementation • 7 Jun 2021 • Debadeepta Dey, Shital Shah, Sebastien Bubeck
We propose a simple but powerful method which we call FEAR, for ranking architectures in any search space.
no code implementations • ICML Workshop AutoML 2021 • Debadeepta Dey, Shital Shah, Sebastien Bubeck
By training different architectures in the search space to the same training or validation error and subsequently comparing the usefulness of the features extracted on the task-dataset of interest by freezing most of the architecture we obtain quick estimates of the relative performance.
no code implementations • NeurIPS 2020 • Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer
In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.
3 code implementations • NeurIPS 2019 • Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, huan zhang, Ilya Razenshteyn, Sebastien Bubeck
In this paper, we employ adversarial training to improve the performance of randomized smoothing.
1 code implementation • NeurIPS 2018 • Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael. I. Jordan
We prove that, in an episodic MDP setting, Q-learning with UCB exploration achieves regret $\tilde{O}(\sqrt{H^3 SAT})$, where $S$ and $A$ are the numbers of states and actions, $H$ is the number of steps per episode, and $T$ is the total number of steps.
no code implementations • NeurIPS 2015 • Sebastien Bubeck, Ronen Eldan, Joseph Lehec
We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD).
no code implementations • 17 Jun 2013 • Kevin Jamieson, Matthew Malloy, Robert Nowak, Sebastien Bubeck
Motivated by large-scale applications, we are especially interested in identifying situations where the total number of samples that are necessary and sufficient to find the best arm scale linearly with the number of arms.
no code implementations • 22 Jul 2012 • Sebastien Bubeck, Damien Ernst, Aurelien Garivier
We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice.