no code implementations • 28 Feb 2022 • Zhaodong Chen, Yuying Quan, Zheng Qu, Liu Liu, Yufei Ding, Yuan Xie
We evaluate the 1:2 and 2:4 sparsity under different configurations and achieve 1. 27~ 1. 89x speedups over the full-attention mechanism.
no code implementations • 21 Oct 2021 • Liu Liu, Zheng Qu, Zhaodong Chen, Yufei Ding, Yuan Xie
We demonstrate that the sparse patterns are dynamic, depending on input sequences.
no code implementations • 29 Sep 2021 • Zhaodong Chen, Liu Liu, Yuying Quan, Zheng Qu, Yufei Ding, Yuan Xie
Transformers are becoming mainstream solutions for various tasks like NLP and Computer vision.
no code implementations • 25 Jul 2021 • Ling Liang, Zheng Qu, Zhaodong Chen, Fengbin Tu, Yujie Wu, Lei Deng, Guoqi Li, Peng Li, Yuan Xie
Although spiking neural networks (SNNs) take benefits from the bio-plausible neural modeling, the low accuracy under the common local synaptic plasticity learning rules limits their application in many practical tasks.
no code implementations • 24 Jan 2019 • Xu Qian, Zheng Qu, Peter Richtárik
We study the problem of minimizing the average of a very large number of smooth functions, which is of key importance in training supervised learning models.
no code implementations • 30 Dec 2015 • Zeyuan Allen-Zhu, Zheng Qu, Peter Richtárik, Yang Yuan
Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems.
no code implementations • NeurIPS 2015 • Zheng Qu, Peter Richtarik, Tong Zhang
We study the problem of minimizing the average of a large number of smooth convex functions penalized with a strongly convex regularizer.
no code implementations • 27 Feb 2015 • Dominik Csiba, Zheng Qu, Peter Richtárik
This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems.
no code implementations • 8 Feb 2015 • Zheng Qu, Peter Richtárik, Martin Takáč, Olivier Fercoq
We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA).
no code implementations • 27 Dec 2014 • Zheng Qu, Peter Richtárik
The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO).
no code implementations • 27 Dec 2014 • Zheng Qu, Peter Richtárik
ALPHA is a remarkably flexible algorithm: in special cases, it reduces to deterministic and randomized methods such as gradient descent, coordinate descent, parallel coordinate descent and distributed coordinate descent -- both in nonaccelerated and accelerated variants.
no code implementations • 21 Nov 2014 • Zheng Qu, Peter Richtárik, Tong Zhang
The distributed variant of Quartz is the first distributed SDCA-like method with an analysis for non-separable data.
no code implementations • 21 May 2014 • Olivier Fercoq, Zheng Qu, Peter Richtárik, Martin Takáč
We propose an efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions.
1 code implementation • 5 Oct 2013 • Zheng Qu, Daniel Wiese, Anuradha M. Annaswamy, Eugene Lavretsky
This paper presents a method to square up a generic MIMO system that already possesses transmission zeros.
Optimization and Control