Search Results for author: Adil Salim

Found 23 papers, 2 papers with code

Long-time asymptotics of noisy SVGD outside the population limit

no code implementations17 Jun 2024 Victor Priser, Pascal Bianchi, Adil Salim

First, we establish that the limit set of noisy SVGD for large is well-defined.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Ranked #5 on MMR total on MRR-Benchmark (using extra training data)

Language Modelling Math +2

Gaussian random field approximation via Stein's method with applications to wide random neural networks

no code implementations28 Jun 2023 Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, Adil Salim

Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level.

Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space

no code implementations10 Apr 2023 Michael Diao, Krishnakumar Balasubramanian, Sinho Chewi, Adil Salim

Of key interest in statistics and machine learning is Gaussian VI, which approximates $\pi$ by minimizing the Kullback-Leibler (KL) divergence to $\pi$ over the space of Gaussians.

Variational Inference

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

no code implementations22 Sep 2022 Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, Anru R. Zhang

We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2.

Denoising

Federated Learning with a Sampling Algorithm under Isoperimetry

no code implementations2 Jun 2022 Lukang Sun, Adil Salim, Peter Richtárik

Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data.

Federated Learning

Improved analysis for a proximal algorithm for sampling

no code implementations13 Feb 2022 Yongxin Chen, Sinho Chewi, Adil Salim, Andre Wibisono

We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity.

Towards a Theory of Non-Log-Concave Sampling: First-Order Stationarity Guarantees for Langevin Monte Carlo

no code implementations10 Feb 2022 Krishnakumar Balasubramanian, Sinho Chewi, Murat A. Erdogdu, Adil Salim, Matthew Zhang

For the task of sampling from a density $\pi \propto \exp(-V)$ on $\mathbb{R}^d$, where $V$ is possibly non-convex but $L$-gradient Lipschitz, we prove that averaged Langevin Monte Carlo outputs a sample with $\varepsilon$-relative Fisher information after $O( L^2 d^2/\varepsilon^2)$ iterations.

An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints

no code implementations22 Feb 2021 Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik

Optimization problems under affine constraints appear in various areas of machine learning.

Optimization and Control

Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization

no code implementations NeurIPS 2020 Dmitry Kovalev, Adil Salim, Peter Richtárik

We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.

A Non-Asymptotic Analysis for Stein Variational Gradient Descent

no code implementations NeurIPS 2020 Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$.

LEMMA

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

no code implementations NeurIPS 2020 Adil Salim, Peter Richtárik

In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

no code implementations3 Apr 2020 Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik

We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning.

The Wasserstein Proximal Gradient Algorithm

no code implementations NeurIPS 2020 Adil Salim, Anna Korba, Giulia Luise

Using techniques from convex optimization and optimal transport, we analyze the FB scheme as a minimization algorithm on the Wasserstein space.

Distributed Fixed Point Methods with Compressed Iterates

no code implementations20 Dec 2019 Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč

We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.

Federated Learning

Learning to Optimize via Dual space Preconditioning

no code implementations25 Sep 2019 Sélim Chraibi, Adil Salim, Samuel Horváth, Filip Hanzely, Peter Richtárik

Preconditioning an minimization algorithm improve its convergence and can lead to a minimizer in one iteration in some extreme cases.

Maximum Mean Discrepancy Gradient Flow

1 code implementation NeurIPS 2019 Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton

We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.

Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

1 code implementation NeurIPS 2019 Adil Salim, Dmitry Kovalev, Peter Richtárik

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution.

A Fully Stochastic Primal-Dual Algorithm

no code implementations23 Jan 2019 Pascal Bianchi, Walid Hachem, Adil Salim

The proposed algorithm is proven to converge to a saddle point of the Lagrangian function.

A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations

no code implementations3 Apr 2018 Adil Salim, Pascal Bianchi, Walid Hachem

The Douglas Rachford algorithm is an algorithm that converges to a minimizer of a sum of two convex functions.

Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs

no code implementations19 Dec 2017 Adil Salim, Pascal Bianchi, Walid Hachem

When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backward step) in the special case where the graph is a simple path without loops.

Cannot find the paper you are looking for? You can Submit a new open access paper.