Search Results for author: Adil Salim

Found 22 papers, 2 papers with code

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Olatunji Ruwase, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, ZiYi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Gaussian random field approximation via Stein's method with applications to wide random neural networks

no code implementations28 Jun 2023 Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, Adil Salim

Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level.

Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space

no code implementations10 Apr 2023 Michael Diao, Krishnakumar Balasubramanian, Sinho Chewi, Adil Salim

Of key interest in statistics and machine learning is Gaussian VI, which approximates $\pi$ by minimizing the Kullback-Leibler (KL) divergence to $\pi$ over the space of Gaussians.

Variational Inference

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

no code implementations22 Sep 2022 Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, Anru R. Zhang

We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2.

Denoising

Federated Learning with a Sampling Algorithm under Isoperimetry

no code implementations2 Jun 2022 Lukang Sun, Adil Salim, Peter Richtárik

Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data.

Federated Learning

Improved analysis for a proximal algorithm for sampling

no code implementations13 Feb 2022 Yongxin Chen, Sinho Chewi, Adil Salim, Andre Wibisono

We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity.

Towards a Theory of Non-Log-Concave Sampling: First-Order Stationarity Guarantees for Langevin Monte Carlo

no code implementations10 Feb 2022 Krishnakumar Balasubramanian, Sinho Chewi, Murat A. Erdogdu, Adil Salim, Matthew Zhang

For the task of sampling from a density $\pi \propto \exp(-V)$ on $\mathbb{R}^d$, where $V$ is possibly non-convex but $L$-gradient Lipschitz, we prove that averaged Langevin Monte Carlo outputs a sample with $\varepsilon$-relative Fisher information after $O( L^2 d^2/\varepsilon^2)$ iterations.

An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints

no code implementations22 Feb 2021 Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik

Optimization problems under affine constraints appear in various areas of machine learning.

Optimization and Control

Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization

no code implementations NeurIPS 2020 Dmitry Kovalev, Adil Salim, Peter Richtárik

We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.

A Non-Asymptotic Analysis for Stein Variational Gradient Descent

no code implementations NeurIPS 2020 Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$.

LEMMA

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

no code implementations NeurIPS 2020 Adil Salim, Peter Richtárik

In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

no code implementations3 Apr 2020 Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik

We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning.

The Wasserstein Proximal Gradient Algorithm

no code implementations NeurIPS 2020 Adil Salim, Anna Korba, Giulia Luise

Using techniques from convex optimization and optimal transport, we analyze the FB scheme as a minimization algorithm on the Wasserstein space.

Distributed Fixed Point Methods with Compressed Iterates

no code implementations20 Dec 2019 Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč

We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.

Federated Learning

Learning to Optimize via Dual space Preconditioning

no code implementations25 Sep 2019 Sélim Chraibi, Adil Salim, Samuel Horváth, Filip Hanzely, Peter Richtárik

Preconditioning an minimization algorithm improve its convergence and can lead to a minimizer in one iteration in some extreme cases.

Maximum Mean Discrepancy Gradient Flow

1 code implementation NeurIPS 2019 Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton

We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.

Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

1 code implementation NeurIPS 2019 Adil Salim, Dmitry Kovalev, Peter Richtárik

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution.

A Fully Stochastic Primal-Dual Algorithm

no code implementations23 Jan 2019 Pascal Bianchi, Walid Hachem, Adil Salim

The proposed algorithm is proven to converge to a saddle point of the Lagrangian function.

A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations

no code implementations3 Apr 2018 Adil Salim, Pascal Bianchi, Walid Hachem

The Douglas Rachford algorithm is an algorithm that converges to a minimizer of a sum of two convex functions.

Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs

no code implementations19 Dec 2017 Adil Salim, Pascal Bianchi, Walid Hachem

When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backward step) in the special case where the graph is a simple path without loops.

Cannot find the paper you are looking for? You can Submit a new open access paper.