You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • ICML 2020 • Yangsibo Huang, Zhao Song, Sanjeev Arora, Kai Li

The new ideas in the current paper are: (a) new variants of mixup with negative as well as positive coefficients, and extend the sample-wise mixup to be pixel-wise.

no code implementations • 26 Nov 2023 • Zhihang Li, Zhao Song, Zifan Wang, Junze Yin

Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss.

no code implementations • 24 Nov 2023 • Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang

Despite this, storing the Key and Value matrices $K, V \in \mathbb{R}^{n \times d}$ still necessitates $O( n d)$ space, leading to significant memory usage.

no code implementations • 24 Nov 2023 • Zhao Song, Junze Yin, Ruizhe Zhang

However, the running times of these algorithms depend on some quantum linear algebra-related parameters, such as $\kappa(A)$, the condition number of $A$.

no code implementations • 22 Nov 2023 • Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang

The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients.

no code implementations • 19 Nov 2023 • Lianke Qin, Saayan Mitra, Zhao Song, Yuanyuan Yang, Tianyi Zhou

In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $|A|=|B| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i. e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i, b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0, 1)$, the goal is to identify those $k$ heavy inner products.

no code implementations • 30 Oct 2023 • Zhao Song, Guangyi Xu, Junze Yin

In this paper, we offer a theoretical analysis of the expressive capabilities of polynomial attention.

1 code implementation • 26 Oct 2023 • Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

We show that contextual sparsity exists, that it can be accurately predicted, and that we can exploit it to speed up LLM inference in wall-clock time without compromising LLM's quality or in-context learning ability.

no code implementations • 19 Oct 2023 • Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.

no code implementations • 18 Oct 2023 • Yichuan Deng, Zhao Song, Tianyi Zhou

Large transformer models have achieved state-of-the-art results in numerous natural language processing tasks.

no code implementations • 17 Oct 2023 • Zhao Song, Chiwun Yang

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates.

no code implementations • 6 Oct 2023 • Josh Alman, Zhao Song

Interestingly, the higher the order of the tensors, the lower the bound on the entries needs to be for an efficient algorithm.

no code implementations • 5 Oct 2023 • Timothy Chu, Zhao Song, Chiwun Yang

To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning).

no code implementations • 23 Sep 2023 • Zhao Song, Weixin Wang, Junze Yin

The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix.

no code implementations • 14 Sep 2023 • Lianke Qin, Zhao Song, Baocheng Sun

A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network.

no code implementations • 14 Sep 2023 • Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin

$A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$.

no code implementations • 28 Aug 2023 • Zhao Song, Junze Yin, Lichen Zhang

Large language models have shown impressive performance in many tasks.

no code implementations • 23 Aug 2023 • Timothy Chu, Zhao Song, Chiwun Yang

Large language models (LLMs) and generative AI have played a transformative role in computer research and applications.

no code implementations • 21 Aug 2023 • Yichuan Deng, Michalis Mamakos, Zhao Song

Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships.

no code implementations • 21 Aug 2023 • Yeqi Gao, Zhao Song, Junze Yin

It is likely that only two types of people would be interested in setting up a practical system for it: $\bullet$ Those who prefer to use a decentralized ChatGPT-like software.

no code implementations • 16 Aug 2023 • Yichuan Deng, Zhao Song, Shenghao Xie

Softmax unit and ReLU unit are the key structure in attention computation.

no code implementations • 17 Jul 2023 • Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song

We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs.

no code implementations • 16 Jul 2023 • Yeqi Gao, Zhao Song, Xin Yang, Ruizhe Zhang

It is well-known that quantum machine has certain computational advantages compared to the classical machine.

no code implementations • 15 Jul 2023 • Yuzhou Gu, Zhao Song, Lichen Zhang

Consequently, we obtain a variety of results for SVMs: * For linear SVM, where the quadratic constraint matrix has treewidth $\tau$, we can solve the corresponding program in time $\widetilde O(n\tau^{(\omega+1)/2}\log(1/\epsilon))$; * For linear SVM, where the quadratic constraint matrix admits a low-rank factorization of rank-$k$, we can solve the corresponding program in time $\widetilde O(nk^{(\omega+1)/2}\log(1/\epsilon))$; * For Gaussian kernel SVM, where the data dimension $d = \Theta(\log n)$ and the squared dataset radius is small, we can solve it in time $O(n^{1+o(1)}\log(1/\epsilon))$.

no code implementations • 13 Jul 2023 • Lianke Qin, Zhao Song, Yuanyuan Yang

Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time.

no code implementations • 5 Jul 2023 • Yeqi Gao, Zhao Song, Shenghao Xie

Given matrices $A_1 \in \mathbb{R}^{n \times d}$, and $A_2 \in \mathbb{R}^{n \times d}$ and $B \in \mathbb{R}^{n \times n}$, the purpose is to solve some certain optimization problems: Normalized version $\min_{X} \| D(X)^{-1} \exp(A_1 X A_2^\top) - B \|_F^2$ and Rescaled version $\| \exp(A_1 X A_2^\top) - D(X) \cdot B \|_F^2$.

1 code implementation • 24 Jun 2023 • Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen

Based on these insights, we propose Heavy Hitter Oracle (H$_2$O), a KV cache eviction policy that dynamically retains a balance of recent and H$_2$ tokens.

no code implementations • 7 Jun 2023 • Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang

For weighted low rank approximation, this improves the runtime of [LLR16] from $n^2 k^2$ to $n^2k$.

no code implementations • 6 Jun 2023 • Xiang Chen, Zhao Song, Baocheng Sun, Junze Yin, Danyang Zhuo

Many machine learning algorithms require large numbers of labeled data to deliver state-of-the-art results.

no code implementations • 4 Jun 2023 • Ritwik Sinha, Zhao Song, Tianyi Zhou

A model trained on these losses balances the trade-off between the creativity and reality of the model.

no code implementations • 1 Jun 2023 • Yichuan Deng, Zhao Song, Junze Yin

Tensor decomposition is a fundamental method used in various areas to deal with high-dimensional data.

no code implementations • 27 May 2023 • Song Bian, Zhao Song, Junze Yin

Many convex optimization problems with important applications in machine learning are formulated as empirical risk minimization (ERM).

no code implementations • 15 May 2023 • Lianke Qin, Zhao Song, Yitan Wang

We consider both the online and offline versions of the problem: in each iteration, the data set changes incrementally or is not changed, and a user can issue a query to maximize the function on a given subset of the data.

no code implementations • 15 May 2023 • Zhao Song, Weixin Wang, Chenbo Yin

But in \textsc{FastPostponedGreedy} algorithm, the status of each node is unknown at first.

no code implementations • 13 May 2023 • Zhao Song, Mingquan Ye

Deep learning has achieved impressive success in a variety of fields because of its good generalization.

no code implementations • 8 May 2023 • Yeqi Gao, Zhao Song, Xin Yang

Inspired by [Vyas, Kakade and Barak 2023], in this work, we provide a provable result for showing how to differentially private approximate the attention matrix.

no code implementations • 1 May 2023 • Yeqi Gao, Zhao Song, Junze Yin

LLMs have shown great promise in improving the accuracy and efficiency of these tasks, and have the potential to revolutionize the field of natural language processing (NLP) in the years to come.

no code implementations • 26 Apr 2023 • Shuai Li, Zhao Song, Yu Xia, Tong Yu, Tianyi Zhou

Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human life-related or even job-related tasks.

no code implementations • 26 Apr 2023 • Zhao Song, Ke Yang, Naiyang Guan, Junjie Zhu, Peng Qiao, Qingyong Hu

Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.

Ranked #4 on Image Classification on VTAB-1k (using extra training data)

no code implementations • 20 Apr 2023 • Yichuan Deng, Zhihang Li, Zhao Song

One of the key computation in LLMs is the softmax unit.

no code implementations • 13 Apr 2023 • Yichuan Deng, Yeqi Gao, Zhao Song

For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019].

no code implementations • 10 Apr 2023 • Yichuan Deng, Sridhar Mahadevan, Zhao Song

It runs in $\widetilde{O}(\mathrm{nnz}(X) + n^{\omega} ) $ time, has $1-\delta$ succeed probability, and chooses $m = O(n \log(n/\delta))$.

no code implementations • 29 Mar 2023 • Yeqi Gao, Sridhar Mahadevan, Zhao Song

Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function.

no code implementations • 28 Mar 2023 • Zhihang Li, Zhao Song, Tianyi Zhou

In this paper, we make use of the input sparsity and purpose an algorithm that use $\log ( \|x_0 - x^*\|_2 / \epsilon)$ iterations and $\widetilde{O}(\mathrm{nnz}(A) + d^{\omega} )$ per iteration time to solve the problem.

no code implementations • 22 Mar 2023 • Lianke Qin, Zhao Song, Ruizhe Zhang

In this paper, we relax that rank-$k$ assumption and solve a much more general matrix sensing problem.

no code implementations • 10 Mar 2023 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu

Current theoretical literature focuses on greedy search on exact near neighbor graph while practitioners use approximate near neighbor graph (ANN-Graph) to reduce the preprocessing time.

no code implementations • 8 Mar 2023 • Yichuan Deng, Zhao Song, Zifan Wang, Han Zhang

The kernel method, which is commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms.

no code implementations • 21 Feb 2023 • Yuzhou Gu, Zhao Song, Junze Yin, Lichen Zhang

Moreover, our algorithm runs in time $\widetilde O(|\Omega| k)$, which is nearly linear in the time to verify the solution while preserving the sample complexity.

no code implementations • 1 Feb 2023 • Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang

One popular approach for solving such $\ell_2$ regression problem is via sketching: picking a structured random matrix $S\in \mathbb{R}^{m\times n}$ with $m\ll n$ and $SA$ can be quickly computed, solve the ``sketched'' regression problem $\arg\min_{x\in \mathbb{R}^d} \|SAx-Sb\|_2$.

no code implementations • 12 Jan 2023 • Chen Shen, Zhao Song, Lei Shi, Jun Tanimoto, Zhen Wang

Altruistic punishment, where individuals incur personal costs to punish others who have harmed third parties, presents an evolutionary conundrum as it undermines individual fitness.

no code implementations • 21 Dec 2022 • Lianke Qin, Aravind Reddy, Zhao Song, Zhaozhuo Xu, Danyang Zhuo

In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation.

no code implementations • 28 Nov 2022 • Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Danyang Zhuo

We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.

no code implementations • 3 Nov 2022 • Xiaoxiao Li, Zhao Song, Runzhou Tao, Guangyi Zhang

As a leading algorithm in this setting, Federated Average FedAvg, which runs Stochastic Gradient Descent (SGD) in parallel on local devices and averages the sequences only once in a while, have been widely used due to their simplicity and low communication cost.

no code implementations • 15 Oct 2022 • Zhao Song, Yitan Wang, Zheng Yu, Lichen Zhang

In this paper, we propose a novel sketching scheme for the first order method in large-scale distributed learning setting, such that the communication costs between distributed agents are saved while the convergence of the algorithms is still guaranteed.

no code implementations • 8 Oct 2022 • Aravind Reddy, Zhao Song, Lichen Zhang

In this work, we initiate the study of \emph{Dynamic Tensor Product Regression}.

no code implementations • 10 Aug 2022 • Yeqi Gao, Lianke Qin, Zhao Song, Yitan Wang

For a neural network of width $m$, $n$ input training data in $d$ dimension, it takes $\Omega(mnd)$ time cost per training iteration for the forward and backward computation.

no code implementations • 9 Aug 2022 • Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of AI.

no code implementations • 8 Aug 2022 • Jiehao Liang, Zhao Song, Zhaozhuo Xu, Danyang Zhuo

In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries.

no code implementations • 7 Aug 2022 • Xiaoxiao Li, Zhao Song, Jiaming Yang

Unlike the convergence analysis in classical centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for three reasons: 1) the complexity of min-max optimization, 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation and 3) inter-client heterogeneity.

no code implementations • 5 Aug 2022 • Hang Hu, Zhao Song, Runzhou Tao, Zhaozhuo Xu, Danyang Zhuo

Online bipartite matching is a fundamental problem in online algorithms.

no code implementations • 26 Jun 2022 • Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff

A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors.

no code implementations • 23 Apr 2022 • Kai Wang, Zhao Song, Georgios Theocharous, Sridhar Mahadevan

Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds.

1 code implementation • 15 Apr 2022 • Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré

We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread.

no code implementations • 14 Dec 2021 • Zhao Song, Lichen Zhang, Ruizhe Zhang

We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function.

no code implementations • 9 Dec 2021 • Wei Deng, Qian Zhang, Yi-An Ma, Zhao Song, Guang Lin

We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i. i. d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence.

no code implementations • 4 Dec 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo

Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$.

no code implementations • NeurIPS 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu

In this work, we focus on improving the per iteration cost of CGM.

1 code implementation • ICLR 2022 • Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré

To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices.

1 code implementation • NeurIPS 2021 • Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, Sanjeev Arora

Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data.

no code implementations • 29 Nov 2021 • Aravind Reddy, Ryan A. Rossi, Zhao Song, Anup Rao, Tung Mai, Nedim Lipka, Gang Wu, Eunyee Koh, Nesreen Ahmed

In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory.

1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré

Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.

no code implementations • NeurIPS 2021 • Zhao Song, Shuo Yang, Ruizhe Zhang

The classical training method requires paying $\Omega(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensional space.

no code implementations • 29 Sep 2021 • Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo

Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.

no code implementations • 29 Sep 2021 • Zhao Song, Zheng Yu, Lichen Zhang

Though most federated learning frameworks only require clients and the server to send gradient information over the network, they still face the challenges of communication efficiency and data privacy.

no code implementations • 29 Sep 2021 • Zhao Song, Baocheng Sun, Danyang Zhuo

In this paper, we present the first deep active learning algorithm which has a provable sample complexity.

no code implementations • 29 Sep 2021 • Xiaoxiao Li, Zhao Song, Jiaming Yang

Unlike the convergence analysis in centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for two reasons: 1) the complexity of min-max optimization, and 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation.

no code implementations • 21 Aug 2021 • Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang

Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic.

1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré

Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.

no code implementations • 18 May 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu

We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions.

no code implementations • 11 May 2021 • Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang

Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction.

no code implementations • 22 Feb 2021 • Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh Saxena, Zhao Song, Huacheng Yu

In addition, we show a similar $\tilde{\Theta}(n \cdot \sqrt{L})$ bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an $L$-step random walk from every vertex in the graph.

Data Structures and Algorithms Computational Complexity

no code implementations • 2 Feb 2021 • Sitan Chen, Zhao Song, Runzhou Tao, Ruizhe Zhang

As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: $\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}$ for $\mathbf{W}$ a random Boolean matrix with $k$-sparse rows, and the goal is to recover $\mathbf{W}$ up to column permutation.

no code implementations • 20 Jan 2021 • Baihe Huang, Shunhua Jiang, Zhao Song, Runzhou Tao

This paper introduces a new robust interior point method analysis for semidefinite programming (SDP).

Optimization and Control Data Structures and Algorithms

no code implementations • 14 Jan 2021 • Jan van den Brand, Yin Tat Lee, Yang P. Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang

In the special case of the minimum cost flow problem on $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities we obtain a randomized method which solves the problem in $\tilde{O}(m+n^{1. 5})$ time.

Data Structures and Algorithms Optimization and Control

no code implementations • ICLR 2021 • Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Re

Recent advances by practitioners in the deep learning community have breathed new life into Locality Sensitive Hashing (LSH), using it to reduce memory and time bottlenecks in neural network (NN) training.

no code implementations • 1 Jan 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Danyang Zhuo

Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving.

no code implementations • ICLR 2021 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo

In this work, we examine the security of InstaHide, a scheme recently proposed by \cite{hsla20} for preserving the security of private datasets in the context of distributed learning.

no code implementations • 1 Jan 2021 • Zhao Song, Zheng Yu

In this work, we propose a sketching-based central path method for solving linear programmings, whose running time matches the state of art results [Cohen, Lee, Song STOC 19; Lee, Song, Zhang COLT 19].

no code implementations • 24 Nov 2020 • Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo

Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.

no code implementations • 23 Nov 2020 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo

In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning.

no code implementations • 23 Nov 2020 • Josh Alman, Timothy Chu, Gary Miller, Shyam Narayanan, Mark Sellke, Zhao Song

This completes the theory of Manhattan to Manhattan metric transforms initiated by Assouad in 1980.

no code implementations • 4 Nov 2020 • Josh Alman, Timothy Chu, Aaron Schild, Zhao Song

We investigate whether or not it is possible to solve the following problems in $n^{1+o(1)}$ time for a $\mathsf{K}$-graph $G_P$ when $d < n^{o(1)}$: $\bullet$ Multiply a given vector by the adjacency matrix or Laplacian matrix of $G_P$ $\bullet$ Find a spectral sparsifier of $G_P$ $\bullet$ Solve a Laplacian system in $G_P$'s Laplacian matrix For each of these problems, we consider all functions of the form $\mathsf{K}(u, v) = f(\|u-v\|_2^2)$ for a function $f:\mathbb{R} \rightarrow \mathbb{R}$.

no code implementations • 22 Oct 2020 • Xiaoxiao Li, Yangsibo Huang, Binghui Peng, Zhao Song, Kai Li

To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks.

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora

In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e. g., BERT) for any sentence or sentence-pair task.

3 code implementations • 6 Oct 2020 • Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora

This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines.

no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu

Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.

no code implementations • 20 Jun 2020 • Jan van den Brand, Binghui Peng, Zhao Song, Omri Weinstein

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $\mathit{second}$-$\mathit{order}$ optimization algorithms beyond SGD, without compromising the generalization error.

no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu

Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.

no code implementations • 16 Apr 2020 • Zhao Song, David P. Woodruff, Peilin Zhong

entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.

no code implementations • 8 Apr 2020 • Haotian Jiang, Yin Tat Lee, Zhao Song, Sam Chiu-wai Wong

We propose a new cutting plane algorithm that uses an optimal $O(n \log (\kappa))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $\kappa = nR/\epsilon$.

no code implementations • 4 Mar 2020 • Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, Kai Li

This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility.

no code implementations • 23 Feb 2020 • Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang

We show that our approach obtains small error and is efficient in both space and time.

no code implementations • ICML 2020 • Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh

In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data.

no code implementations • NeurIPS 2020 • Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora

Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.

no code implementations • ICLR 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

no code implementations • 16 Dec 2019 • Sitan Chen, Jerry Li, Zhao Song

In this paper, we give the first algorithm for learning an MLR that runs in time which is sub-exponential in $k$.

4 code implementations • ICML 2020 • Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.

Ranked #8 on Speech Synthesis on LibriTTS

no code implementations • NeurIPS 2019 • Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon

Inductive matrix completion (IMC) method is a standard approach for this problem where the given query as well as the items are embedded in a common low-dimensional space.

1 code implementation • NeurIPS 2019 • Zhao Song, David Woodruff, Peilin Zhong

entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.

no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong

When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.

no code implementations • NeurIPS 2019 • Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff

For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.

1 code implementation • NeurIPS 2019 • Huaian Diao, Zhao Song, David P. Woodruff, Xin Yang

In the total least squares problem, one is given an $m \times n$ matrix $A$, and an $m \times d$ matrix $B$, and one seeks to "correct" both $A$ and $B$, obtaining matrices $\hat{A}$ and $\hat{B}$, so that there exists an $X$ satisfying the equation $\hat{A}X = \hat{B}$.

no code implementations • 9 Jun 2019 • Zhao Song, Xin Yang

We improve the over-parametrization size over two beautiful results [Li and Liang' 2018] and [Du, Zhai, Poczos and Singh' 2019] in deep learning theory.

2 code implementations • ICML 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

no code implementations • 11 May 2019 • Yin Tat Lee, Zhao Song, Qiuyi Zhang

Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems.

1 code implementation • 1 May 2019 • Zhao Song, Wen Sun

Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15].

no code implementations • ICLR 2019 • Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh

In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network.

no code implementations • 26 Dec 2018 • Yibo Lin, Zhao Song, Lin F. Yang

In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets.

no code implementations • 15 Dec 2018 • Yin Tat Lee, Zhao Song, Santosh S. Vempala

We apply this to the sampling problem to obtain a nearly linear implementation of HMC for a broad class of smooth, strongly logconcave densities, with the number of iterations (parallel depth) and gradient evaluations being $\mathit{polylogarithmic}$ in the dimension (rather than polynomial as in previous work).

2 code implementations • 2 Dec 2018 • Zhao Song, Ronald E. Parr, Lawrence Carin

The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator.

no code implementations • 9 Nov 2018 • Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song

In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).

1 code implementation • NeurIPS 2019 • Zhao Song, David P. Woodruff, Peilin Zhong

Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.

no code implementations • NeurIPS 2019 • Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song

In this paper, we focus on recurrent neural networks (RNNs) which are multi-layer networks widely used in natural language processing.

no code implementations • 26 May 2018 • Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon

A standard approach to modeling this problem is Inductive Matrix Completion where the predicted rating is modeled as an inner product of the user and the item features projected onto a latent space.

6 code implementations • ICML 2018 • Tsui-Wei Weng, huan zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, Luca Daniel

Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17].

2 code implementations • ICML 2018 • Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon

In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its training while giving us stronger expressive power.

no code implementations • 1 Feb 2018 • Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong

We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset.

no code implementations • 27 Dec 2017 • Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff

That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.

no code implementations • 25 Dec 2017 • David Liau, Eric Price, Zhao Song, Ger Yang

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms.

no code implementations • NeurIPS 2017 • Zhao Song, Yusuke Muraoka, Ryohei Fujimaki, Lawrence Carin

We propose a scalable algorithm for model selection in sigmoid belief networks (SBNs), based on the factorized asymptotic Bayesian (FAB) framework.

no code implementations • 8 Nov 2017 • Kai Zhong, Zhao Song, Inderjit S. Dhillon

In this paper, we consider parameter recovery for non-overlapping convolutional neural networks (CNNs) with multiple kernels.

no code implementations • ICML 2017 • Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

For activation functions that are also smooth, we show $\mathit{local~linear~convergence}$ guarantees of gradient descent under a resampling rule.

no code implementations • 30 May 2017 • Eric Price, Zhao Song, David P. Woodruff

Our main result is that, when $S$ is the subsampled randomized Fourier/Hadamard transform, the error $x' - x^*$ behaves as if it lies in a "random" direction within this bound: for any fixed direction $a\in \mathbb{R}^d$, we have with $1 - d^{-c}$ probability that \[ \langle a, x'-x^*\rangle \lesssim \frac{\|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, \quad (1) \] where $c, \gamma > 0$ are arbitrary constants.

no code implementations • 26 Apr 2017 • Zhao Song, David P. Woodruff, Peilin Zhong

Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.

1 code implementation • NeurIPS 2016 • Zhao Song, David Woodruff, huan zhang

We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i. e., even without reading most of the input tensor.

no code implementations • NeurIPS 2016 • Zhao Song, Ronald E. Parr, Xuejun Liao, Lawrence Carin

We then develop a supervised linear feature encoding method that is motivated by insights from linear value function approximation theory, as well as empirical successes from deep RL.

no code implementations • 3 Nov 2016 • Zhao Song, David P. Woodruff, Peilin Zhong

We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.

no code implementations • 5 Sep 2012 • Zhao Song, Aleksandar Dogandzic

Our signal reconstruction scheme is based on an EM iteration that aims at maximizing the posterior distribution of the signal and its state variables given the noise variance.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.