no code implementations • ICML 2020 • Yangsibo Huang, Zhao Song, Sanjeev Arora, Kai Li
The new ideas in the current paper are: (a) new variants of mixup with negative as well as positive coefficients, and extend the sample-wise mixup to be pixel-wise.
no code implementations • 26 Jun 2022 • Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff
A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors.
no code implementations • 23 Apr 2022 • Kai Wang, Zhao Song, Georgios Theocharous, Sridhar Mahadevan
Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds.
1 code implementation • 15 Apr 2022 • Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré
We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread.
no code implementations • 14 Dec 2021 • Zhao Song, Lichen Zhang, Ruizhe Zhang
We consider the problem of training a multi-layer over-parametrized neural networks to minimize the empirical risk induced by a loss function.
no code implementations • 9 Dec 2021 • Wei Deng, Qian Zhang, Yi-An Ma, Zhao Song, Guang Lin
We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i. i. d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence.
no code implementations • 4 Dec 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo
Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$.
no code implementations • NeurIPS 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu
In this work, we focus on improving the per iteration cost of CGM.
1 code implementation • ICLR 2022 • Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré
To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices.
1 code implementation • NeurIPS 2021 • Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, Sanjeev Arora
Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data.
no code implementations • 29 Nov 2021 • Aravind Reddy, Ryan A. Rossi, Zhao Song, Anup Rao, Tung Mai, Nedim Lipka, Gang Wu, Eunyee Koh, Nesreen Ahmed
In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory.
no code implementations • 4 Nov 2021 • Sudhanshu Chanpuriya, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Zhao Song, Cameron Musco
These models output the probabilities of edges existing between all pairs of nodes, and the probability of a link between two nodes increases with the dot product of vectors associated with the nodes.
1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.
no code implementations • NeurIPS 2021 • Zhao Song, Shuo Yang, Ruizhe Zhang
The classical training method requires paying $\Omega(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensional space.
no code implementations • 29 Sep 2021 • Xiaoxiao Li, Zhao Song, Jiaming Yang
Unlike the convergence analysis in centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for two reasons: 1) the complexity of min-max optimization, and 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation.
no code implementations • 29 Sep 2021 • Zhao Song, Zheng Yu, Lichen Zhang
Though most federated learning frameworks only require clients and the server to send gradient information over the network, they still face the challenges of communication efficiency and data privacy.
no code implementations • 29 Sep 2021 • Zhao Song, Baocheng Sun, Danyang Zhuo
In this paper, we present the first deep active learning algorithm which has a provable sample complexity.
no code implementations • 29 Sep 2021 • Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo
Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.
no code implementations • 21 Aug 2021 • Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang
Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic.
1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.
no code implementations • 18 May 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu
We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions.
no code implementations • 11 May 2021 • Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang
Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction.
no code implementations • 22 Feb 2021 • Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh Saxena, Zhao Song, Huacheng Yu
In addition, we show a similar $\tilde{\Theta}(n \cdot \sqrt{L})$ bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an $L$-step random walk from every vertex in the graph.
Data Structures and Algorithms Computational Complexity
no code implementations • 2 Feb 2021 • Sitan Chen, Zhao Song, Runzhou Tao, Ruizhe Zhang
As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: $\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}$ for $\mathbf{W}$ a random Boolean matrix with $k$-sparse rows, and the goal is to recover $\mathbf{W}$ up to column permutation.
no code implementations • 20 Jan 2021 • Baihe Huang, Shunhua Jiang, Zhao Song, Runzhou Tao
This paper introduces a new robust interior point method analysis for semidefinite programming (SDP).
Optimization and Control Data Structures and Algorithms
no code implementations • 14 Jan 2021 • Jan van den Brand, Yin Tat Lee, Yang P. Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang
In the special case of the minimum cost flow problem on $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities we obtain a randomized method which solves the problem in $\tilde{O}(m+n^{1. 5})$ time.
Data Structures and Algorithms Optimization and Control
no code implementations • ICLR 2021 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo
In this work, we examine the security of InstaHide, a scheme recently proposed by \cite{hsla20} for preserving the security of private datasets in the context of distributed learning.
no code implementations • ICLR 2021 • Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Re
Recent advances by practitioners in the deep learning community have breathed new life into Locality Sensitive Hashing (LSH), using it to reduce memory and time bottlenecks in neural network (NN) training.
no code implementations • 1 Jan 2021 • Zhao Song, Zheng Yu
In this work, we propose a sketching-based central path method for solving linear programmings, whose running time matches the state of art results [Cohen, Lee, Song STOC 19; Lee, Song, Zhang COLT 19].
no code implementations • 1 Jan 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Danyang Zhuo
Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving.
no code implementations • 24 Nov 2020 • Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo
Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.
no code implementations • 23 Nov 2020 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo
In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning.
no code implementations • 23 Nov 2020 • Josh Alman, Timothy Chu, Gary Miller, Shyam Narayanan, Mark Sellke, Zhao Song
This completes the theory of Manhattan to Manhattan metric transforms initiated by Assouad in 1980.
no code implementations • 4 Nov 2020 • Josh Alman, Timothy Chu, Aaron Schild, Zhao Song
We investigate whether or not it is possible to solve the following problems in $n^{1+o(1)}$ time for a $\mathsf{K}$-graph $G_P$ when $d < n^{o(1)}$: $\bullet$ Multiply a given vector by the adjacency matrix or Laplacian matrix of $G_P$ $\bullet$ Find a spectral sparsifier of $G_P$ $\bullet$ Solve a Laplacian system in $G_P$'s Laplacian matrix For each of these problems, we consider all functions of the form $\mathsf{K}(u, v) = f(\|u-v\|_2^2)$ for a function $f:\mathbb{R} \rightarrow \mathbb{R}$.
no code implementations • 22 Oct 2020 • Xiaoxiao Li, Yangsibo Huang, Binghui Peng, Zhao Song, Kai Li
To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora
In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e. g., BERT) for any sentence or sentence-pair task.
2 code implementations • 6 Oct 2020 • Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora
This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines.
no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.
no code implementations • 20 Jun 2020 • Jan van den Brand, Binghui Peng, Zhao Song, Omri Weinstein
The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $\mathit{second}$-$\mathit{order}$ optimization algorithms beyond SGD, without compromising the generalization error.
no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu
Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.
no code implementations • 16 Apr 2020 • Zhao Song, David P. Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • 8 Apr 2020 • Haotian Jiang, Yin Tat Lee, Zhao Song, Sam Chiu-wai Wong
We propose a new cutting plane algorithm that uses an optimal $O(n \log (\kappa))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $\kappa = nR/\epsilon$.
no code implementations • 4 Mar 2020 • Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, Kai Li
This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility.
no code implementations • 23 Feb 2020 • Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
We show that our approach obtains small error and is efficient in both space and time.
no code implementations • ICML 2020 • Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh
In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data.
no code implementations • NeurIPS 2020 • Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora
Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.
no code implementations • ICLR 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
no code implementations • 16 Dec 2019 • Sitan Chen, Jerry Li, Zhao Song
In this paper, we give the first algorithm for learning an MLR that runs in time which is sub-exponential in $k$.
4 code implementations • ICML 2020 • Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.
no code implementations • NeurIPS 2019 • Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon
Inductive matrix completion (IMC) method is a standard approach for this problem where the given query as well as the items are embedded in a common low-dimensional space.
1 code implementation • NeurIPS 2019 • Zhao Song, David Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong
When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.
no code implementations • NeurIPS 2019 • Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.
1 code implementation • NeurIPS 2019 • Huaian Diao, Zhao Song, David P. Woodruff, Xin Yang
In the total least squares problem, one is given an $m \times n$ matrix $A$, and an $m \times d$ matrix $B$, and one seeks to "correct" both $A$ and $B$, obtaining matrices $\hat{A}$ and $\hat{B}$, so that there exists an $X$ satisfying the equation $\hat{A}X = \hat{B}$.
no code implementations • 9 Jun 2019 • Zhao Song, Xin Yang
We improve the over-parametrization size over two beautiful results [Li and Liang' 2018] and [Du, Zhai, Poczos and Singh' 2019] in deep learning theory.
2 code implementations • ICML 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
no code implementations • 11 May 2019 • Yin Tat Lee, Zhao Song, Qiuyi Zhang
Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems.
1 code implementation • 1 May 2019 • Zhao Song, Wen Sun
Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15].
no code implementations • ICLR 2019 • Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh
In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network.
no code implementations • 26 Dec 2018 • Yibo Lin, Zhao Song, Lin F. Yang
In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets.
no code implementations • 15 Dec 2018 • Yin Tat Lee, Zhao Song, Santosh S. Vempala
We apply this to the sampling problem to obtain a nearly linear implementation of HMC for a broad class of smooth, strongly logconcave densities, with the number of iterations (parallel depth) and gradient evaluations being $\mathit{polylogarithmic}$ in the dimension (rather than polynomial as in previous work).
2 code implementations • 2 Dec 2018 • Zhao Song, Ronald E. Parr, Lawrence Carin
The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator.
no code implementations • 9 Nov 2018 • Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song
In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).
1 code implementation • NeurIPS 2019 • Zhao Song, David P. Woodruff, Peilin Zhong
Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
no code implementations • NeurIPS 2019 • Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song
In this paper, we focus on recurrent neural networks (RNNs) which are multi-layer networks widely used in natural language processing.
no code implementations • 26 May 2018 • Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon
A standard approach to modeling this problem is Inductive Matrix Completion where the predicted rating is modeled as an inner product of the user and the item features projected onto a latent space.
6 code implementations • ICML 2018 • Tsui-Wei Weng, huan zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, Luca Daniel
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17].
2 code implementations • ICML 2018 • Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon
In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its training while giving us stronger expressive power.
no code implementations • 1 Feb 2018 • Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong
We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset.
no code implementations • 27 Dec 2017 • Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.
no code implementations • 25 Dec 2017 • David Liau, Eric Price, Zhao Song, Ger Yang
We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms.
no code implementations • NeurIPS 2017 • Zhao Song, Yusuke Muraoka, Ryohei Fujimaki, Lawrence Carin
We propose a scalable algorithm for model selection in sigmoid belief networks (SBNs), based on the factorized asymptotic Bayesian (FAB) framework.
no code implementations • 8 Nov 2017 • Kai Zhong, Zhao Song, Inderjit S. Dhillon
In this paper, we consider parameter recovery for non-overlapping convolutional neural networks (CNNs) with multiple kernels.
no code implementations • ICML 2017 • Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon
For activation functions that are also smooth, we show $\mathit{local~linear~convergence}$ guarantees of gradient descent under a resampling rule.
no code implementations • 30 May 2017 • Eric Price, Zhao Song, David P. Woodruff
Our main result is that, when $S$ is the subsampled randomized Fourier/Hadamard transform, the error $x' - x^*$ behaves as if it lies in a "random" direction within this bound: for any fixed direction $a\in \mathbb{R}^d$, we have with $1 - d^{-c}$ probability that \[ \langle a, x'-x^*\rangle \lesssim \frac{\|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, \quad (1) \] where $c, \gamma > 0$ are arbitrary constants.
no code implementations • 26 Apr 2017 • Zhao Song, David P. Woodruff, Peilin Zhong
Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.
1 code implementation • NeurIPS 2016 • Zhao Song, David Woodruff, huan zhang
We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i. e., even without reading most of the input tensor.
no code implementations • NeurIPS 2016 • Zhao Song, Ronald E. Parr, Xuejun Liao, Lawrence Carin
We then develop a supervised linear feature encoding method that is motivated by insights from linear value function approximation theory, as well as empirical successes from deep RL.
no code implementations • 3 Nov 2016 • Zhao Song, David P. Woodruff, Peilin Zhong
We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.
no code implementations • 5 Sep 2012 • Zhao Song, Aleksandar Dogandzic
Our signal reconstruction scheme is based on an EM iteration that aims at maximizing the posterior distribution of the signal and its state variables given the noise variance.