Search Results for author: Ronen Eldan

Found 22 papers, 3 papers with code

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations • 22 Apr 2024 • Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra, Xiyang Dai, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Victor Fragoso, Dan Iter, Mei Gao, Min Gao, Jianfeng Gao, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Ce Liu, Mengchen Liu, Weishung Liu, Eric Lin, Zeqi Lin, Chong Luo, Piyush Madan, Matt Mazzola, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Xin Wang, Lijuan Wang, Chunyu Wang, Yu Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Haiping Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Sonali Yadav, Fan Yang, Jianwei Yang, ZiYi Yang, Yifan Yang, Donghan Yu, Lu Yuan, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Language Modelling

Paper
Add Code

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

no code implementations • 2 Apr 2024 • Mark Russinovich, Ahmed Salem, Ronen Eldan

Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner.

Paper
Add Code

TinyGSM: achieving >80% on GSM8k with small language models

no code implementations • 14 Dec 2023 • Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang

Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B.

Ranked #59 on Arithmetic Reasoning on GSM8K

Arithmetic Reasoning GSM8K +2

Paper
Add Code

Positional Description Matters for Transformers Arithmetic

no code implementations • 22 Nov 2023 • Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang

For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication.

Memorization

Paper
Add Code

Who's Harry Potter? Approximate Unlearning in LLMs

no code implementations • 3 Oct 2023 • Ronen Eldan, Mark Russinovich

Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a baseline model.

Language Modelling

Paper
Add Code

Textbooks Are All You Need II: phi-1.5 technical report

1 code implementation • 11 Sep 2023 • Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1. 3 billion parameter model with Python coding performance close to the state-of-the-art.

Ranked #15 on Question Answering on SIQA

Code Generation Common Sense Reasoning +3

Paper
Code

Textbooks Are All You Need

no code implementations • 20 Jun 2023 • Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

Despite this small scale, phi-1 attains pass@1 accuracy 50. 6% on HumanEval and 55. 5% on MBPP.

Ranked #42 on Code Generation on HumanEval

Code Generation Language Modelling +1

Paper
Add Code

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

no code implementations • 12 May 2023 • Ronen Eldan, Yuanzhi Li

In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3. 5 and GPT-4.

Paper
Add Code

Sparks of Artificial General Intelligence: Early experiments with GPT-4

2 code implementations • 22 Mar 2023 • Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Ranked #33 on Arithmetic Reasoning on GSM8K

Arithmetic Reasoning Math Word Problem Solving

17,794

Paper
Code

Unveiling Transformers with LEGO: a synthetic reasoning task

1 code implementation • 9 Jun 2022 • Yi Zhang, Arturs Backurs, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Tal Wagner

We study how the trained models eventually succeed at the task, and in particular, we manage to understand some of the attention heads as well as how the information flows in the network.

Learning to Execute

Paper
Code

Non-asymptotic approximations of neural networks by Gaussian processes

no code implementations • 17 Feb 2021 • Ronen Eldan, Dan Mikulincer, Tselil Schramm

We study the extent to which wide neural networks may be approximated by Gaussian processes when initialized with random weights.

Gaussian Processes

Paper
Add Code

Network size and size of the weights in memorization with two-layers neural networks

no code implementations • NeurIPS 2020 • Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

Memorization

Paper
Add Code

Community detection and percolation of information in a geometric setting

no code implementations • 28 Jun 2020 • Ronen Eldan, Dan Mikulincer, Hester Pieters

We make the first steps towards generalizing the theory of stochastic block models, in the sparse regime, towards a model where the discrete community structure is replaced by an underlying geometry.

Community Detection

Paper
Add Code

Network size and weights size for memorization with two-layers neural networks

no code implementations • 4 Jun 2020 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

Memorization

Paper
Add Code

Depth Separations in Neural Networks: What is Actually Being Separated?

no code implementations • 15 Apr 2019 • Itay Safran, Ronen Eldan, Ohad Shamir

Existing depth separation results for constant-depth networks essentially show that certain radial functions in $\mathbb{R}^d$, which can be easily approximated with depth $3$ networks, cannot be approximated by depth $2$ networks, even up to constant accuracy, unless their size is exponential in $d$.

Paper
Add Code

Kernel-based methods for bandit convex optimization

no code implementations • 11 Jul 2016 • Sébastien Bubeck, Ronen Eldan, Yin Tat Lee

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem.

Paper
Add Code

The Power of Depth for Feedforward Neural Networks

no code implementations • 12 Dec 2015 • Ronen Eldan, Ohad Shamir

We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension.

Paper
Add Code

Finite-Time Analysis of Projected Langevin Monte Carlo

no code implementations • NeurIPS 2015 • Sebastien Bubeck, Ronen Eldan, Joseph Lehec

We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD).

Paper
Add Code

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

no code implementations • NeurIPS 2015 • Ofer Dekel, Ronen Eldan, Tomer Koren

The best algorithm for the general bandit convex optimization problem guarantees a regret of $\widetilde{O}(T^{5/6})$, while the best known lower bound is $\Omega(T^{1/2})$.

Paper
Add Code

Multi-scale exploration of convex functions and bandit convex optimization

no code implementations • 23 Jul 2015 • Sébastien Bubeck, Ronen Eldan

We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function.

Paper
Add Code

Sampling from a log-concave distribution with Projected Langevin Monte Carlo

no code implementations • 9 Jul 2015 • Sébastien Bubeck, Ronen Eldan, Joseph Lehec

We extend the Langevin Monte Carlo (LMC) algorithm to compactly supported measures via a projection step, akin to projected Stochastic Gradient Descent (SGD).

Paper
Add Code

The entropic barrier: a simple and optimal universal self-concordant barrier

no code implementations • 4 Dec 2014 • Sébastien Bubeck, Ronen Eldan

We prove that the Cram\'er transform of the uniform measure on a convex body in $\mathbb{R}^n$ is a $(1+o(1)) n$-self-concordant barrier, improving a seminal result of Nesterov and Nemirovski.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.