no code implementations • 24 Apr 2025 • Wenqiang Zhou, Zhendong Yu, Xinyu Liu, Jiaming Yang, Rong Xiao, Tao Wang, Chenwei Tang, Jiancheng Lv
Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance.
1 code implementation • 20 Jan 2025 • Michał Dereziński, Deanna Needell, Elizaveta Rebrova, Jiaming Yang
In this paper, we introduce Kaczmarz++, an accelerated randomized block Kaczmarz algorithm that exploits outlying singular values in the input to attain a fast Krylov-style convergence.
no code implementations • 14 Jul 2024 • Pratik Rathore, Zachary Frangella, Jiaming Yang, Michał Dereziński, Madeleine Udell
ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks derived from a wide range of application domains, demonstrating the superiority of full KRR over inducing points KRR.
no code implementations • 9 May 2024 • Michał Dereziński, Christopher Musco, Jiaming Yang
Our methods are based on constructing a low-rank Nystr\"om approximation to $A$ using sparse random matrix sketching.
no code implementations • 26 Mar 2024 • Yongyi Yang, Jiaming Yang, Wei Hu, Michał Dereziński
In this paper, we propose HERTA: a High-Efficiency and Rigorous Training Algorithm for Unfolded GNNs that accelerates the whole training process, achieving a nearly-linear time worst-case training guarantee.
no code implementations • 14 Dec 2023 • Michał Dereziński, Jiaming Yang
We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $\|A\tilde x-b\|\leq \epsilon\|b\|$ in time: $$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $\omega < 2. 372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor.
1 code implementation • 28 Nov 2023 • Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang
In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters.
no code implementations • 7 Aug 2022 • Xiaoxiao Li, Zhao Song, Jiaming Yang
Unlike the convergence analysis in classical centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for three reasons: 1) the complexity of min-max optimization, 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation and 3) inter-client heterogeneity.
1 code implementation • ICLR 2022 • Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré
To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices.
no code implementations • 29 Sep 2021 • Xiaoxiao Li, Zhao Song, Jiaming Yang
Unlike the convergence analysis in centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for two reasons: 1) the complexity of min-max optimization, and 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation.