no code implementations • 17 Jun 2024 • Victor Priser, Pascal Bianchi, Adil Salim
First, we establish that the limit set of noisy SVGD for large is well-defined.
no code implementations • 22 Apr 2024 • Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou
We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.
Ranked #5 on MMR total on MRR-Benchmark (using extra training data)
no code implementations • 28 Jun 2023 • Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, Adil Salim
Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level.
no code implementations • 20 Jun 2023 • Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li
Despite this small scale, phi-1 attains pass@1 accuracy 50. 6% on HumanEval and 55. 5% on MBPP.
no code implementations • 10 Apr 2023 • Michael Diao, Krishnakumar Balasubramanian, Sinho Chewi, Adil Salim
Of key interest in statistics and machine learning is Gaussian VI, which approximates $\pi$ by minimizing the Kullback-Leibler (KL) divergence to $\pi$ over the space of Gaussians.
no code implementations • 22 Sep 2022 • Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, Anru R. Zhang
We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2.
no code implementations • 2 Jun 2022 • Lukang Sun, Adil Salim, Peter Richtárik
Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data.
no code implementations • 13 Feb 2022 • Yongxin Chen, Sinho Chewi, Adil Salim, Andre Wibisono
We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity.
no code implementations • 10 Feb 2022 • Krishnakumar Balasubramanian, Sinho Chewi, Murat A. Erdogdu, Adil Salim, Matthew Zhang
For the task of sampling from a density $\pi \propto \exp(-V)$ on $\mathbb{R}^d$, where $V$ is possibly non-convex but $L$-gradient Lipschitz, we prove that averaged Langevin Monte Carlo outputs a sample with $\varepsilon$-relative Fisher information after $O( L^2 d^2/\varepsilon^2)$ iterations.
no code implementations • 6 Jun 2021 • Adil Salim, Lukang Sun, Peter Richtárik
We first establish the convergence of the algorithm.
no code implementations • 22 Feb 2021 • Adil Salim, Laurent Condat, Dmitry Kovalev, Peter Richtárik
Optimization problems under affine constraints appear in various areas of machine learning.
Optimization and Control
no code implementations • NeurIPS 2020 • Dmitry Kovalev, Adil Salim, Peter Richtárik
We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
no code implementations • NeurIPS 2020 • Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton
We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$.
no code implementations • NeurIPS 2020 • Adil Salim, Peter Richtárik
In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm.
no code implementations • 3 Apr 2020 • Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik
We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning.
no code implementations • NeurIPS 2020 • Adil Salim, Anna Korba, Giulia Luise
Using techniques from convex optimization and optimal transport, we analyze the FB scheme as a minimization algorithm on the Wasserstein space.
no code implementations • 20 Dec 2019 • Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč
We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.
no code implementations • 25 Sep 2019 • Sélim Chraibi, Adil Salim, Samuel Horváth, Filip Hanzely, Peter Richtárik
Preconditioning an minimization algorithm improve its convergence and can lead to a minimizer in one iteration in some extreme cases.
1 code implementation • NeurIPS 2019 • Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton
We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.
1 code implementation • NeurIPS 2019 • Adil Salim, Dmitry Kovalev, Peter Richtárik
We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution.
no code implementations • 23 Jan 2019 • Pascal Bianchi, Walid Hachem, Adil Salim
The proposed algorithm is proven to converge to a saddle point of the Lagrangian function.
no code implementations • 3 Apr 2018 • Adil Salim, Pascal Bianchi, Walid Hachem
The Douglas Rachford algorithm is an algorithm that converges to a minimizer of a sum of two convex functions.
no code implementations • 19 Dec 2017 • Adil Salim, Pascal Bianchi, Walid Hachem
When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backward step) in the special case where the graph is a simple path without loops.