no code implementations • 17 Apr 2025 • Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, Vahab Mirrokni
Going beyond these objectives, we present a set of alternative attentional bias configurations along with their effective approximations to stabilize their training procedure.
1 code implementation • 31 Dec 2024 • Ali Behrouz, Peilin Zhong, Vahab Mirrokni
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention.
no code implementations • 9 Oct 2024 • Zeman Li, Xinwei Zhang, Peilin Zhong, Yuan Deng, Meisam Razaviyayn, Vahab Mirrokni
In our experiments on the larger OPT-30B model, on average, Addax outperforms MeZO in terms of accuracy/F1 score by >16 and runs 30x faster on a single H100 GPU.
no code implementations • 17 Jun 2024 • Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong
In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels.
no code implementations • 7 Jun 2024 • Vincent Cohen-Addad, Tommaso d'Orsi, Alessandro Epasto, Vahab Mirrokni, Peilin Zhong
We revisit the input perturbations framework for differential privacy where noise is added to the input $A\in \mathcal{S}$ and the result is then projected back to the space of admissible datasets $\mathcal{S}$.
no code implementations • 2 Oct 2023 • Praneeth Kacham, Vahab Mirrokni, Peilin Zhong
For context lengths of 32k and GPT-2 style models, our model achieves a 2. 5-4x speedup in training compared to FlashAttention, with no observed degradation in quality across our experiments.
no code implementations • 14 Jul 2023 • Alessandro Epasto, Tamalika Mukherjee, Peilin Zhong
In this work, we provide the first differentially private streaming algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$ using $poly(k, d,\log(T))$ space to achieve a constant multiplicative error and a $poly(k, d,\log(T))$ additive error.
3 code implementations • 12 Apr 2023 • CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Munoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii, Peilin Zhong
In this work, we present a new theoretical framework to measure re-identification risk in such user representations.
no code implementations • 5 Dec 2022 • CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren Schudy, Peilin Zhong
We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.
1 code implementation • 14 Jul 2022 • Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Peilin Zhong
Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding.
no code implementations • NeurIPS 2020 • Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang
Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i. e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon.
no code implementations • 16 Apr 2020 • Zhao Song, David P. Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
1 code implementation • NeurIPS 2019 • Zhao Song, David Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong
When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.
1 code implementation • ICLR 2020 • Chang Xiao, Peilin Zhong, Changxi Zheng
In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.
no code implementations • NeurIPS 2019 • Peilin Zhong, Yuchen Mo, Chang Xiao, Peng-Yu Chen, Changxi Zheng
The conventional wisdom to this end is by reducing through training a statistical distance (such as $f$-divergence) between the generated distribution and provided data distribution.
1 code implementation • NeurIPS 2019 • Zhao Song, David P. Woodruff, Peilin Zhong
Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
no code implementations • ICML 2018 • Alexandr Andoni, Chengyu Lin, Ying Sheng, Peilin Zhong, Ruiqi Zhong
An Orlicz norm is parameterized by a non-negative convex function $G:\mathbb{R}_+\rightarrow\mathbb{R}_+$ with $G(0)=0$: the Orlicz norm of a vector $x\in\mathbb{R}^n$ is defined as $ \|x\|_G=\inf\left\{\alpha>0\large\mid\sum_{i=1}^n G(|x_i|/\alpha)\leq 1\right\}.
1 code implementation • NeurIPS 2018 • Chang Xiao, Peilin Zhong, Changxi Zheng
This paper addresses the mode collapse for generative adversarial networks (GANs).
no code implementations • 1 Feb 2018 • Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong
We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset.
no code implementations • 26 Apr 2017 • Zhao Song, David P. Woodruff, Peilin Zhong
Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.
no code implementations • 3 Nov 2016 • Zhao Song, David P. Woodruff, Peilin Zhong
We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.
no code implementations • 28 Jan 2016 • David P. Woodruff, Peilin Zhong
For example, each of $s$ servers may have an $n \times d$ matrix $A^t$, and we may be interested in computing a low rank approximation to $A = f(\sum_{t=1}^s A^t)$, where $f$ is a function which is applied entrywise to the matrix $\sum_{t=1}^s A^t$.