Search Results for author: Ronen Eldan

Found 22 papers, 3 papers with code

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, ZiYi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Language Modelling

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

no code implementations2 Apr 2024 Mark Russinovich, Ahmed Salem, Ronen Eldan

Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner.

TinyGSM: achieving >80% on GSM8k with small language models

no code implementations14 Dec 2023 Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang

Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B.

Arithmetic Reasoning GSM8K +2

Positional Description Matters for Transformers Arithmetic

no code implementations22 Nov 2023 Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang

For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication.

Memorization

Who's Harry Potter? Approximate Unlearning in LLMs

no code implementations3 Oct 2023 Ronen Eldan, Mark Russinovich

Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a baseline model.

Language Modelling

Textbooks Are All You Need II: phi-1.5 technical report

1 code implementation11 Sep 2023 Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1. 3 billion parameter model with Python coding performance close to the state-of-the-art.

Code Generation Common Sense Reasoning +3

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

no code implementations12 May 2023 Ronen Eldan, Yuanzhi Li

In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3. 5 and GPT-4.

Sparks of Artificial General Intelligence: Early experiments with GPT-4

2 code implementations22 Mar 2023 Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang

We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.

Arithmetic Reasoning Math Word Problem Solving

Unveiling Transformers with LEGO: a synthetic reasoning task

1 code implementation9 Jun 2022 Yi Zhang, Arturs Backurs, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Tal Wagner

We study how the trained models eventually succeed at the task, and in particular, we manage to understand some of the attention heads as well as how the information flows in the network.

Learning to Execute

Non-asymptotic approximations of neural networks by Gaussian processes

no code implementations17 Feb 2021 Ronen Eldan, Dan Mikulincer, Tselil Schramm

We study the extent to which wide neural networks may be approximated by Gaussian processes when initialized with random weights.

Gaussian Processes

Network size and size of the weights in memorization with two-layers neural networks

no code implementations NeurIPS 2020 Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

Memorization

Community detection and percolation of information in a geometric setting

no code implementations28 Jun 2020 Ronen Eldan, Dan Mikulincer, Hester Pieters

We make the first steps towards generalizing the theory of stochastic block models, in the sparse regime, towards a model where the discrete community structure is replaced by an underlying geometry.

Community Detection

Network size and weights size for memorization with two-layers neural networks

no code implementations4 Jun 2020 Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

Memorization

Depth Separations in Neural Networks: What is Actually Being Separated?

no code implementations15 Apr 2019 Itay Safran, Ronen Eldan, Ohad Shamir

Existing depth separation results for constant-depth networks essentially show that certain radial functions in $\mathbb{R}^d$, which can be easily approximated with depth $3$ networks, cannot be approximated by depth $2$ networks, even up to constant accuracy, unless their size is exponential in $d$.

Kernel-based methods for bandit convex optimization

no code implementations11 Jul 2016 Sébastien Bubeck, Ronen Eldan, Yin Tat Lee

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem.

The Power of Depth for Feedforward Neural Networks

no code implementations12 Dec 2015 Ronen Eldan, Ohad Shamir

We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension.

Finite-Time Analysis of Projected Langevin Monte Carlo

no code implementations NeurIPS 2015 Sebastien Bubeck, Ronen Eldan, Joseph Lehec

We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD).

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

no code implementations NeurIPS 2015 Ofer Dekel, Ronen Eldan, Tomer Koren

The best algorithm for the general bandit convex optimization problem guarantees a regret of $\widetilde{O}(T^{5/6})$, while the best known lower bound is $\Omega(T^{1/2})$.

Multi-scale exploration of convex functions and bandit convex optimization

no code implementations23 Jul 2015 Sébastien Bubeck, Ronen Eldan

We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function.

Sampling from a log-concave distribution with Projected Langevin Monte Carlo

no code implementations9 Jul 2015 Sébastien Bubeck, Ronen Eldan, Joseph Lehec

We extend the Langevin Monte Carlo (LMC) algorithm to compactly supported measures via a projection step, akin to projected Stochastic Gradient Descent (SGD).

The entropic barrier: a simple and optimal universal self-concordant barrier

no code implementations4 Dec 2014 Sébastien Bubeck, Ronen Eldan

We prove that the Cram\'er transform of the uniform measure on a convex body in $\mathbb{R}^n$ is a $(1+o(1)) n$-self-concordant barrier, improving a seminal result of Nesterov and Nemirovski.

Cannot find the paper you are looking for? You can Submit a new open access paper.