Search Results for author: Sebastien Bubeck

Found 13 papers, 5 papers with code

TinyGSM: achieving >80% on GSM8k with small language models

no code implementations • 14 Dec 2023 • Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang

Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B.

Ranked #58 on Arithmetic Reasoning on GSM8K

Arithmetic Reasoning GSM8K +2

Paper
Add Code

How to Fine-Tune Vision Models with SGD

no code implementations • 17 Nov 2022 • Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar

SGD and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision.

Paper
Add Code

AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation

1 code implementation • 14 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah, Sebastien Bubeck, Jianfeng Gao

Furthermore, existing MoE works do not consider computational constraints (e. g., FLOPs, latency) to guide their design.

Machine Translation Neural Architecture Search +1

Paper
Code

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

1 code implementation • 4 Mar 2022 • Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz Koushanfar, Debadeepta Dey

Results show that the perplexity of 16-layer GPT-2 and Transformer-XL can be achieved with up to 1. 5x, 2. 5x faster runtime and 1. 2x, 2. 0x lower peak memory utilization.

Language Modelling Neural Architecture Search

453

Paper
Code

Ranking Convolutional Architectures by their Feature Extraction Capabilities

no code implementations • 29 Sep 2021 • Debadeepta Dey, Shital Shah, Sebastien Bubeck

We propose a simple but powerful method which we call FEAR, for ranking architectures in any search space.

Neural Architecture Search

Paper
Add Code

FEAR: A Simple Lightweight Method to Rank Architectures

1 code implementation • 7 Jun 2021 • Debadeepta Dey, Shital Shah, Sebastien Bubeck

We propose a simple but powerful method which we call FEAR, for ranking architectures in any search space.

Neural Architecture Search

453

Paper
Code

Ranking Architectures by Feature Extraction Capabilities

no code implementations • ICML Workshop AutoML 2021 • Debadeepta Dey, Shital Shah, Sebastien Bubeck

By training diﬀerent architectures in the search space to the same training or validation error and subsequently comparing the usefulness of the features extracted on the task-dataset of interest by freezing most of the architecture we obtain quick estimates of the relative performance.

Neural Architecture Search

Paper
Add Code

Network size and size of the weights in memorization with two-layers neural networks

no code implementations • NeurIPS 2020 • Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

Memorization

Paper
Add Code

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

3 code implementations • NeurIPS 2019 • Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, huan zhang, Ilya Razenshteyn, Sebastien Bubeck

In this paper, we employ adversarial training to improve the performance of randomized smoothing.

Adversarial Attack Adversarial Defense

220

Paper
Code

Is Q-learning Provably Efficient?

1 code implementation • NeurIPS 2018 • Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael. I. Jordan

We prove that, in an episodic MDP setting, Q-learning with UCB exploration achieves regret $\tilde{O}(\sqrt{H^3 SAT})$, where $S$ and $A$ are the numbers of states and actions, $H$ is the number of steps per episode, and $T$ is the total number of steps.

Q-Learning Reinforcement Learning (RL)

Paper
Code

Finite-Time Analysis of Projected Langevin Monte Carlo

no code implementations • NeurIPS 2015 • Sebastien Bubeck, Ronen Eldan, Joseph Lehec

We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD).

Paper
Add Code

On Finding the Largest Mean Among Many

no code implementations • 17 Jun 2013 • Kevin Jamieson, Matthew Malloy, Robert Nowak, Sebastien Bubeck

Motivated by large-scale applications, we are especially interested in identifying situations where the total number of samples that are necessary and sufficient to find the best arm scale linearly with the number of arms.

Multi-Armed Bandits

Paper
Add Code

Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality

no code implementations • 22 Jul 2012 • Sebastien Bubeck, Damien Ernst, Aurelien Garivier

We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.