no code implementations • ICML 2020 • Yangsibo Huang, Zhao Song, Sanjeev Arora, Kai Li
The new ideas in the current paper are: (a) new variants of mixup with negative as well as positive coefficients, and extend the sample-wise mixup to be pixel-wise.
no code implementations • 12 Mar 2025 • Chengyue Gong, Xiaoyu Li, YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yu Tian
Flow matching has emerged as a powerful framework for generative modeling, offering computational advantages over diffusion models by leveraging deterministic Ordinary Differential Equations (ODEs) instead of stochastic dynamics.
no code implementations • 10 Mar 2025 • Yuefan Cao, Xuyang Guo, Jiayan Huo, YIngyu Liang, Zhenmei Shi, Zhao Song, Jiahao Zhang, Zhen Zhuang
Generative modeling is widely regarded as one of the most essential problems in today's AI community, with text-to-image generation having gained unprecedented real-world impacts.
no code implementations • 24 Feb 2025 • Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song
The weighted low-rank approximation problem is a fundamental numerical linear algebra problem and has many applications in machine learning.
no code implementations • 23 Feb 2025 • Chengyue Gong, Yekun Ke, Xiaoyu Li, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
In this study, we address this gap by analyzing the circuit complexity of the FlowAR architecture.
no code implementations • 2 Feb 2025 • Yuefan Cao, Xiaoyu Li, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Jiahao Zhang
As AI research surges in both impact and volume, conferences have imposed submission limits to maintain paper quality and alleviate organizational pressure.
no code implementations • 1 Feb 2025 • Yang Cao, Zhao Song, Chiwun Yang
This paper considers an efficient video modeling process called Video Latent Flow Matching (VLFM).
no code implementations • 18 Jan 2025 • Xiaoyu Li, YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Zhen Zhuang
In this work, we extend the Loop Transformer architecture's neural algorithmic reasoning capability to simulate hypergraph algorithms, addressing the gap between neural networks and combinatorial optimization over hypergraphs.
no code implementations • 17 Jan 2025 • Yuefan Cao, Chengyue Gong, Xiaoyu Li, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
This limitation often arises from the inability of the text encoder to produce accurate embeddings, which hinders the video generation model.
no code implementations • 11 Jan 2025 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Wei Wang, Jiahao Zhang
Graph Neural Networks (GNNs) have become the standard approach for learning and reasoning over relational data, leveraging the message-passing mechanism that iteratively propagates node embeddings through graph structures.
no code implementations • 10 Jan 2025 • Dabing Cheng, Haosen Zhan, Xingchen Zhao, Guisheng Liu, Zemin Li, Jinghui Xie, Zhao Song, Weiguo Feng, Bingyue Peng
The exponential growth of short-video content has ignited a surge in the necessity for efficient, automated solutions to video editing, with challenges arising from the need to understand videos and tailor the editing according to user requirements.
no code implementations • 8 Jan 2025 • Yekun Ke, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Understanding the expressive ability of a specific model is essential for grasping its capacity limitations.
no code implementations • 8 Jan 2025 • Yekun Ke, Xiaoyu Li, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Recently, Visual Autoregressive ($\mathsf{VAR}$) Models introduced a groundbreaking advancement in the field of image generation, offering a scalable approach through a coarse-to-fine ``next-scale prediction'' paradigm.
no code implementations • 23 Dec 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Mingda Wan
Tensor Attention extends traditional attention mechanisms by capturing high-order correlations across multiple modalities, addressing the limitations of classical matrix-based attention.
no code implementations • 23 Dec 2024 • Yifang Chen, Jiayan Huo, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
The Rotary Position Embedding (RoPE) mechanism has become a powerful enhancement to the Transformer architecture, which enables models to capture token relationships when encoding positional information.
1 code implementation • 22 Dec 2024 • Yang Cao, Xiaoyu Li, Zhao Song
The results demonstrate Grams' superior performance, including faster convergence and better generalization, compared to widely-used optimizers such as Adam, Lion, and their cautious variants.
no code implementations • 17 Dec 2024 • Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing.
no code implementations • 17 Dec 2024 • Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Yanyu Li, Yifan Gong, Kai Zhang, Hao Tan, Jason Kuen, Henghui Ding, Zhihao Shu, Wei Niu, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
In this paper, we show that performing the full computation of the model at each diffusion step is unnecessary, as some computations can be skipped by lazily reusing the results of previous steps.
no code implementations • 9 Dec 2024 • Yifang Chen, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework.
no code implementations • 8 Dec 2024 • Zhao Song, Ali Vakilian, David P. Woodruff, Samson Zhou
Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications.
no code implementations • 8 Dec 2024 • Yekun Ke, YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
The application of transformer-based models on time series forecasting (TSF) tasks has long been popular to study.
no code implementations • 7 Dec 2024 • Xiaoyu Li, Yuanpeng Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning, capable of replacing components such as pooling layers, LSTMs, and attention mechanisms.
no code implementations • 25 Nov 2024 • Weimin Wu, Maojiang Su, Jerry Yao-Chieh Hu, Zhao Song, Han Liu
We investigate the transformer's capability for in-context learning (ICL) to simulate the training process of deep models.
no code implementations • 25 Nov 2024 • Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani, Chenyang Li, Zhao Song, Han Liu
Our key contributions are prompt tuning on \textit{single-head} transformers with only a \textit{single} self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH).
no code implementations • 12 Nov 2024 • Bo Chen, Xiaoyu Li, YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song
In this work, we establish a circuit complexity bound for Transformers with $\mathsf{RoPE}$ attention.
no code implementations • 8 Nov 2024 • Jerry Yao-Chieh Hu, Erzhi Liu, Han Liu, Zhao Song, Lichen Zhang
Given a database of bit strings $A_1,\ldots, A_m\in \{0, 1\}^n$, a fundamental data structure task is to estimate the distances between a given query $B\in \{0, 1\}^n$ with all the strings in the database.
no code implementations • 3 Nov 2024 • Majid Daliri, Zhao Song, Chiwun Yang
Research by Wang et al. (2023); Ma et al. (2024) indicates that the performance of these 1-bit LLMs progressively improves as the number of parameters increases, hinting at the potential existence of a Scaling Law for 1-bit Neural Networks.
no code implementations • 15 Oct 2024 • Bo Chen, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Our results demonstrate that as long as the input data has a constant condition number, e. g., $n = O(d)$, the linear looped Transformers can achieve a small error by multi-step gradient descent during in-context learning.
no code implementations • 15 Oct 2024 • Yekun Ke, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers.
no code implementations • 15 Oct 2024 • YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou
Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants.
no code implementations • 14 Oct 2024 • Bo Chen, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Our approach achieves a running time of $O(mn^{4/5})$ significantly faster than the naive approach $O(mn)$ for attention generation, where $n$ is the context length, $m$ is the query length, and $d$ is the hidden dimension.
no code implementations • 12 Oct 2024 • YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou
Previous work has demonstrated that attention mechanisms are Turing complete.
no code implementations • 12 Oct 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
For small cache sizes, we provide an algorithm that improves over existing methods and achieves the tight bounds.
no code implementations • 8 Oct 2024 • Yuzhou Gu, Nikki Lijing Kuang, Yi-An Ma, Zhao Song, Lichen Zhang
We design a walk that mixes in $\widetilde O((nd+dL^2R^2)\log(w/\delta))$ steps with a per iteration cost of $\widetilde O(n^\omega+n^2d^{3\omega-5})$.
no code implementations • 29 Sep 2024 • Zhen Wang, Ruiqi Song, Chen Shen, Shiya Yin, Zhao Song, Balaraju Battu, Lei Shi, Danyang Jia, Talal Rahwan, Shuyue Hu
We design three types of LLMs: (i) Cooperative, aiming to assist its human associate; (ii) Selfish, focusing solely on maximizing its self-interest; and (iii) Fair, balancing its own and collective interest, while slightly prioritizing self-interest.
no code implementations • 3 Sep 2024 • Erzhi Liu, Jerry Yao-Chieh Hu, Alex Reneau, Zhao Song, Han Liu
In this paper, we improve the best previous result [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024] in three aspects: - We reduce query time by a factor of $\alpha^{-1} \log n$.
no code implementations • 23 Aug 2024 • YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou
The computational complexity of the self-attention mechanism in popular transformer architectures poses significant challenges for training and inference, and becomes the bottleneck for long inputs.
no code implementations • 22 Aug 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^{\omega} + d^{2+a+o(1)} + d^{1+\omega(1, 1, a)-a})$ for any $a \in [0, 1]$, where $\omega$ is the exponent of matrix multiplication.
no code implementations • 21 Aug 2024 • Chenyang Li, Zhao Song, Zhaoxing Xu, Junze Yin
Leverage scores have become essential in statistics and machine learning, aiding regression analysis, randomized matrix computations, and various other tasks.
no code implementations • 12 Aug 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu
Determining the John ellipsoid - the largest volume ellipsoid contained within a convex polytope - is a fundamental problem with applications in machine learning, optimization, and data analytics.
no code implementations • 20 Jul 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
In addition, our data structure can guarantee that the process of answering user query satisfies $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer.
no code implementations • 18 Jul 2024 • Jiuxiang Gu, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal issues.
no code implementations • 1 Jul 2024 • Jerry Yao-Chieh Hu, Weimin Wu, Zhao Song, Han Liu
For backward computation, we leverage the low-rank structure within the gradient computation of DiTs training for possible algorithmic speedup.
1 code implementation • 20 Jun 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
Prompting and context-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks.
no code implementations • 5 Jun 2024 • Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory.
no code implementations • 26 May 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention.
no code implementations • 26 May 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
We prove that if the target distribution is a $k$-mixture of Gaussians, the density of the entire diffusion process will also be a $k$-mixture of Gaussians.
no code implementations • 9 May 2024 • Yeqi Gao, Yuzhou Gu, Zhao Song
We obtain similar results for the binary hypothesis testing problem for leverage score models.
no code implementations • 8 May 2024 • YIngyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Zhuoyan Xu, Junze Yin
We then design a fast algorithm to approximate the attention matrix via a sum of such $k$ convolution matrices.
no code implementations • 6 May 2024 • Jiuxiang Gu, Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song
The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture.
no code implementations • 21 Apr 2024 • Zhihang Li, Zhao Song, Weixin Wang, Junze Yin, Zheng Yu
Leverage score is a fundamental problem in machine learning and theoretical computer science.
no code implementations • 3 Apr 2024 • Yichuan Deng, Zhao Song, Chiwun Yang
The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.
no code implementations • 12 Feb 2024 • Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou
We direct our focus to the complex algebraic learning task of modular addition involving $k$ inputs.
no code implementations • 10 Feb 2024 • Yeqi Gao, Zhao Song, Ruizhe Zhang
Given its widespread application in machine learning and optimization, the Kronecker product emerges as a pivotal linear algebra operator.
no code implementations • 7 Feb 2024 • Josh Alman, Zhao Song
Large language models (LLMs) have made fundamental contributions over the last a few years.
no code implementations • 7 Feb 2024 • Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song, Han Liu
Specifically, we establish an upper bound criterion for the norm of input query patterns and memory patterns.
no code implementations • 2 Feb 2024 • Yichuan Deng, Zhao Song, Chiwun Yang
Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.
no code implementations • 26 Nov 2023 • Zhihang Li, Zhao Song, Zifan Wang, Junze Yin
Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss.
no code implementations • 24 Nov 2023 • Zhao Song, Junze Yin, Ruizhe Zhang
However, the running times of these algorithms depend on some quantum linear algebra-related parameters, such as $\kappa(A)$, the condition number of $A$.
no code implementations • 24 Nov 2023 • Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang
Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$.
no code implementations • 22 Nov 2023 • Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang
The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients.
no code implementations • 19 Nov 2023 • Lianke Qin, Saayan Mitra, Zhao Song, Yuanyuan Yang, Tianyi Zhou
In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $|A|=|B| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i. e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i, b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0, 1)$, the goal is to identify those $k$ heavy inner products.
no code implementations • 30 Oct 2023 • Zhao Song, Guangyi Xu, Junze Yin
In this paper, we offer a theoretical analysis of the expressive capabilities of polynomial attention.
1 code implementation • 26 Oct 2023 • Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen
We show that contextual sparsity exists, that it can be accurately predicted, and that we can exploit it to speed up LLM inference in wall-clock time without compromising LLM's quality or in-context learning ability.
no code implementations • 19 Oct 2023 • Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang
In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.
no code implementations • 18 Oct 2023 • Yichuan Deng, Zhao Song, Tianyi Zhou
Large transformer models have achieved state-of-the-art results in numerous natural language processing tasks.
no code implementations • 17 Oct 2023 • Zhao Song, Chiwun Yang
The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates.
no code implementations • 6 Oct 2023 • Josh Alman, Zhao Song
Interestingly, the higher the order of the tensors, the lower the bound on the entries needs to be for an efficient algorithm.
no code implementations • 5 Oct 2023 • Timothy Chu, Zhao Song, Chiwun Yang
To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning).
no code implementations • 23 Sep 2023 • Zhao Song, Weixin Wang, Junze Yin
The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix.
no code implementations • 14 Sep 2023 • Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin
$A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$.
no code implementations • 14 Sep 2023 • Lianke Qin, Zhao Song, Baocheng Sun
A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network.
no code implementations • 2 Sep 2023 • Lianke Qin, Aravind Reddy, Zhao Song
Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering.
no code implementations • 28 Aug 2023 • Zhao Song, Junze Yin, Lichen Zhang
Given an input matrix $A\in \mathbb{R}^{n\times d}$ with $n\gg d$ and a response vector $b$, we first consider the matrix exponential of the matrix $A^\top A$ as a proxy, and we in turn design algorithms for two types of regression problems: $\min_{x\in \mathbb{R}^d}\|(A^\top A)^jx-b\|_2$ and $\min_{x\in \mathbb{R}^d}\|A(A^\top A)^jx-b\|_2$ for any positive integer $j$.
1 code implementation • 23 Aug 2023 • Timothy Chu, Zhao Song, Chiwun Yang
Large language models (LLMs) and generative AI have played a transformative role in computer research and applications.
no code implementations • 21 Aug 2023 • Yeqi Gao, Zhao Song, Junze Yin
It is likely that only two types of people would be interested in setting up a practical system for it: $\bullet$ Those who prefer to use a decentralized ChatGPT-like software.
no code implementations • 21 Aug 2023 • Yichuan Deng, Michalis Mamakos, Zhao Song
Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships.
no code implementations • 16 Aug 2023 • Yichuan Deng, Zhao Song, Shenghao Xie
Softmax unit and ReLU unit are the key structure in attention computation.
no code implementations • 17 Jul 2023 • Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song
We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs.
no code implementations • 16 Jul 2023 • Yeqi Gao, Zhao Song, Xin Yang, Ruizhe Zhang
It is well-known that quantum machine has certain computational advantages compared to the classical machine.
no code implementations • 15 Jul 2023 • Yuzhou Gu, Zhao Song, Lichen Zhang
Consequently, we obtain results for SVMs: * For linear SVM when the input data is $d$-dimensional, our algorithm runs in time $\widetilde O(nd^{(\omega+1)/2}\log(1/\epsilon))$ where $\omega\approx 2. 37$ is the fast matrix multiplication exponent; * For Gaussian kernel SVM, when the data dimension $d = {\color{black}O(\log n)}$ and the squared dataset radius is sub-logarithmic in $n$, our algorithm runs in time $O(n^{1+o(1)}\log(1/\epsilon))$.
no code implementations • 13 Jul 2023 • Lianke Qin, Zhao Song, Yuanyuan Yang
Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time.
no code implementations • 5 Jul 2023 • Yeqi Gao, Zhao Song, Shenghao Xie
Given matrices $A_1 \in \mathbb{R}^{n \times d}$, and $A_2 \in \mathbb{R}^{n \times d}$ and $B \in \mathbb{R}^{n \times n}$, the purpose is to solve some certain optimization problems: Normalized version $\min_{X} \| D(X)^{-1} \exp(A_1 X A_2^\top) - B \|_F^2$ and Rescaled version $\| \exp(A_1 X A_2^\top) - D(X) \cdot B \|_F^2$.
2 code implementations • 24 Jun 2023 • Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen
Based on these insights, we propose Heavy Hitter Oracle (H$_2$O), a KV cache eviction policy that dynamically retains a balance of recent and H$_2$ tokens.
no code implementations • 7 Jun 2023 • Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang
For weighted low rank approximation, this improves the runtime of [LLR16] from $\|W\|_0k^2$ to $\|W\|_0 k$ where $\|W\|_0$ denotes the number of nonzero entries of the weight matrix.
no code implementations • 6 Jun 2023 • Xiang Chen, Zhao Song, Baocheng Sun, Junze Yin, Danyang Zhuo
Many machine learning algorithms require large numbers of labeled data to deliver state-of-the-art results.
no code implementations • 4 Jun 2023 • Ritwik Sinha, Zhao Song, Tianyi Zhou
A model trained on these losses balances the trade-off between the creativity and reality of the model.
no code implementations • 1 Jun 2023 • Yichuan Deng, Zhao Song, Junze Yin
Tensor decomposition is a fundamental method used in various areas to deal with high-dimensional data.
no code implementations • 27 May 2023 • Song Bian, Zhao Song, Junze Yin
Many convex optimization problems with important applications in machine learning are formulated as empirical risk minimization (ERM).
no code implementations • 15 May 2023 • Lianke Qin, Zhao Song, Yitan Wang
We consider both the online and offline versions of the problem: in each iteration, the data set changes incrementally or is not changed, and a user can issue a query to maximize the function on a given subset of the data.
no code implementations • 15 May 2023 • Zhao Song, Weixin Wang, Chenbo Yin, Junze Yin
But in \textsc{FastPostponedGreedy} algorithm, the status of each node is unknown at first.
no code implementations • 13 May 2023 • Zhao Song, Mingquan Ye
Deep learning has achieved impressive success in a variety of fields because of its good generalization.
no code implementations • 8 May 2023 • Yeqi Gao, Zhao Song, Xin Yang, Yufa Zhou
Large language models (LLMs), especially those based on the Transformer architecture, have had a profound impact on various aspects of daily life, such as natural language processing, content generation, research methodologies, and more.
no code implementations • 1 May 2023 • Yeqi Gao, Zhao Song, Junze Yin
LLMs have shown great promise in improving the accuracy and efficiency of these tasks, and have the potential to revolutionize the field of natural language processing (NLP) in the years to come.
no code implementations • 26 Apr 2023 • Shuai Li, Zhao Song, Yu Xia, Tong Yu, Tianyi Zhou
Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human life-related or even job-related tasks.
no code implementations • 26 Apr 2023 • Zhao Song, Ke Yang, Naiyang Guan, Junjie Zhu, Peng Qiao, Qingyong Hu
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.
Ranked #4 on
Image Classification
on VTAB-1k
(using extra training data)
no code implementations • 20 Apr 2023 • Yichuan Deng, Zhihang Li, Zhao Song
One of the key computation in LLMs is the softmax unit.
no code implementations • 13 Apr 2023 • Yichuan Deng, Yeqi Gao, Zhao Song
For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019].
no code implementations • 10 Apr 2023 • Yichuan Deng, Sridhar Mahadevan, Zhao Song
It runs in $\widetilde{O}(\mathrm{nnz}(X) + n^{\omega} ) $ time, has $1-\delta$ succeed probability, and chooses $m = O(n \log(n/\delta))$.
no code implementations • 29 Mar 2023 • Yeqi Gao, Sridhar Mahadevan, Zhao Song
Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function.
no code implementations • 28 Mar 2023 • Zhihang Li, Zhao Song, Tianyi Zhou
In this paper, we make use of the input sparsity and purpose an algorithm that use $\log ( \|x_0 - x^*\|_2 / \epsilon)$ iterations and $\widetilde{O}(\mathrm{nnz}(A) + d^{\omega} )$ per iteration time to solve the problem.
no code implementations • 22 Mar 2023 • Lianke Qin, Zhao Song, Ruizhe Zhang
In this paper, we relax that rank-$k$ assumption and solve a much more general matrix sensing problem.
no code implementations • 10 Mar 2023 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu
Current theoretical literature focuses on greedy search on exact near neighbor graph while practitioners use approximate near neighbor graph (ANN-Graph) to reduce the preprocessing time.
no code implementations • 8 Mar 2023 • Yichuan Deng, Zhao Song, Zifan Wang, Han Zhang
The kernel method, which is commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms.
no code implementations • 21 Feb 2023 • Yuzhou Gu, Zhao Song, Junze Yin, Lichen Zhang
Moreover, our algorithm runs in time $\widetilde O(|\Omega| k)$, which is nearly linear in the time to verify the solution while preserving the sample complexity.
no code implementations • 1 Feb 2023 • Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang
One popular approach for solving such $\ell_2$ regression problem is via sketching: picking a structured random matrix $S\in \mathbb{R}^{m\times n}$ with $m\ll n$ and $SA$ can be quickly computed, solve the ``sketched'' regression problem $\arg\min_{x\in \mathbb{R}^d} \|SAx-Sb\|_2$.
no code implementations • 12 Jan 2023 • Chen Shen, Zhao Song, Lei Shi, Jun Tanimoto, Zhen Wang
Altruistic punishment, where individuals incur personal costs to punish others who have harmed third parties, presents an evolutionary conundrum as it undermines individual fitness.
no code implementations • 21 Dec 2022 • Lianke Qin, Aravind Reddy, Zhao Song, Zhaozhuo Xu, Danyang Zhuo
In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation.
no code implementations • 28 Nov 2022 • Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Junze Yin, Danyang Zhuo
We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.
no code implementations • 3 Nov 2022 • Xiaoxiao Li, Zhao Song, Runzhou Tao, Guangyi Zhang
As a leading algorithm in this setting, Federated Average FedAvg, which runs Stochastic Gradient Descent (SGD) in parallel on local devices and averages the sequences only once in a while, have been widely used due to their simplicity and low communication cost.
no code implementations • 15 Oct 2022 • Zhao Song, Yitan Wang, Zheng Yu, Lichen Zhang
In this paper, we propose a novel sketching scheme for the first order method in large-scale distributed learning setting, such that the communication costs between distributed agents are saved while the convergence of the algorithms is still guaranteed.
no code implementations • 8 Oct 2022 • Aravind Reddy, Zhao Song, Lichen Zhang
In this work, we initiate the study of \emph{Dynamic Tensor Product Regression}.
no code implementations • 10 Aug 2022 • Yeqi Gao, Lianke Qin, Zhao Song, Yitan Wang
For a neural network of width $m$, $n$ input training data in $d$ dimension, it takes $\Omega(mnd)$ time cost per training iteration for the forward and backward computation.
no code implementations • 9 Aug 2022 • Yichuan Deng, Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo
The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI).
no code implementations • 8 Aug 2022 • Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo
In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries.
no code implementations • 7 Aug 2022 • Xiaoxiao Li, Zhao Song, Jiaming Yang
Unlike the convergence analysis in classical centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for three reasons: 1) the complexity of min-max optimization, 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation and 3) inter-client heterogeneity.
no code implementations • 5 Aug 2022 • Hang Hu, Zhao Song, Runzhou Tao, Zhaozhuo Xu, Junze Yin, Danyang Zhuo
Online bipartite matching is a fundamental problem in online algorithms.
no code implementations • 26 Jun 2022 • Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff
A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors.
no code implementations • 23 Apr 2022 • Kai Wang, Zhao Song, Georgios Theocharous, Sridhar Mahadevan
Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds.
1 code implementation • 15 Apr 2022 • Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré
We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread.
no code implementations • 14 Dec 2021 • Zhao Song, Lichen Zhang, Ruizhe Zhang
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function.
no code implementations • 9 Dec 2021 • Wei Deng, Qian Zhang, Yi-An Ma, Zhao Song, Guang Lin
We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i. i. d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence.
no code implementations • 4 Dec 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo
Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$.
no code implementations • NeurIPS 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu
In this work, we focus on improving the per iteration cost of CGM.
1 code implementation • ICLR 2022 • Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré
To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices.
1 code implementation • NeurIPS 2021 • Yangsibo Huang, Samyak Gupta, Zhao Song, Kai Li, Sanjeev Arora
Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data.
no code implementations • 29 Nov 2021 • Aravind Reddy, Ryan A. Rossi, Zhao Song, Anup Rao, Tung Mai, Nedim Lipka, Gang Wu, Eunyee Koh, Nesreen Ahmed
In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory.
1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.
no code implementations • NeurIPS 2021 • Zhao Song, Shuo Yang, Ruizhe Zhang
The classical training method requires paying $\Omega(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensional space.
no code implementations • 29 Sep 2021 • Xiaoxiao Li, Zhao Song, Jiaming Yang
Unlike the convergence analysis in centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for two reasons: 1) the complexity of min-max optimization, and 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation.
no code implementations • 29 Sep 2021 • Zhao Song, Baocheng Sun, Danyang Zhuo
In this paper, we present the first deep active learning algorithm which has a provable sample complexity.
no code implementations • 29 Sep 2021 • Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo
Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.
no code implementations • 29 Sep 2021 • Zhao Song, Zheng Yu, Lichen Zhang
Though most federated learning frameworks only require clients and the server to send gradient information over the network, they still face the challenges of communication efficiency and data privacy.
no code implementations • 21 Aug 2021 • Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang
Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic.
1 code implementation • NeurIPS 2021 • Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré
Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.
no code implementations • 18 May 2021 • Anshumali Shrivastava, Zhao Song, Zhaozhuo Xu
We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions.
no code implementations • 11 May 2021 • Baihe Huang, Xiaoxiao Li, Zhao Song, Xin Yang
Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction.
no code implementations • 22 Feb 2021 • Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh Saxena, Zhao Song, Huacheng Yu
In addition, we show a similar $\tilde{\Theta}(n \cdot \sqrt{L})$ bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an $L$-step random walk from every vertex in the graph.
Data Structures and Algorithms Computational Complexity
no code implementations • 2 Feb 2021 • Sitan Chen, Zhao Song, Runzhou Tao, Ruizhe Zhang
As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: $\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}$ for $\mathbf{W}$ a random Boolean matrix with $k$-sparse rows, and the goal is to recover $\mathbf{W}$ up to column permutation.
no code implementations • 20 Jan 2021 • Baihe Huang, Shunhua Jiang, Zhao Song, Runzhou Tao
This paper introduces a new robust interior point method analysis for semidefinite programming (SDP).
Optimization and Control Data Structures and Algorithms
no code implementations • 14 Jan 2021 • Jan van den Brand, Yin Tat Lee, Yang P. Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang
In the special case of the minimum cost flow problem on $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities we obtain a randomized method which solves the problem in $\tilde{O}(m+n^{1. 5})$ time.
Data Structures and Algorithms Optimization and Control
no code implementations • ICLR 2021 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo
In this work, we examine the security of InstaHide, a scheme recently proposed by \cite{hsla20} for preserving the security of private datasets in the context of distributed learning.
no code implementations • 1 Jan 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Danyang Zhuo
Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving.
no code implementations • 1 Jan 2021 • Zhao Song, Zheng Yu
In this work, we propose a sketching-based central path method for solving linear programmings, whose running time matches the state of art results [Cohen, Lee, Song STOC 19; Lee, Song, Zhang COLT 19].
no code implementations • ICLR 2021 • Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Re
Recent advances by practitioners in the deep learning community have breathed new life into Locality Sensitive Hashing (LSH), using it to reduce memory and time bottlenecks in neural network (NN) training.
no code implementations • 24 Nov 2020 • Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo
On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity.
no code implementations • 23 Nov 2020 • Josh Alman, Timothy Chu, Gary Miller, Shyam Narayanan, Mark Sellke, Zhao Song
This completes the theory of Manhattan to Manhattan metric transforms initiated by Assouad in 1980.
no code implementations • 23 Nov 2020 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo
In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning.
no code implementations • 4 Nov 2020 • Josh Alman, Timothy Chu, Aaron Schild, Zhao Song
We investigate whether or not it is possible to solve the following problems in $n^{1+o(1)}$ time for a $\mathsf{K}$-graph $G_P$ when $d < n^{o(1)}$: $\bullet$ Multiply a given vector by the adjacency matrix or Laplacian matrix of $G_P$ $\bullet$ Find a spectral sparsifier of $G_P$ $\bullet$ Solve a Laplacian system in $G_P$'s Laplacian matrix For each of these problems, we consider all functions of the form $\mathsf{K}(u, v) = f(\|u-v\|_2^2)$ for a function $f:\mathbb{R} \rightarrow \mathbb{R}$.
no code implementations • 22 Oct 2020 • Xiaoxiao Li, Yangsibo Huang, Binghui Peng, Zhao Song, Kai Li
To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora
In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e. g., BERT) for any sentence or sentence-pair task.
3 code implementations • 6 Oct 2020 • Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora
This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines.
no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.
no code implementations • 20 Jun 2020 • Jan van den Brand, Binghui Peng, Zhao Song, Omri Weinstein
The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $\mathit{second}$-$\mathit{order}$ optimization algorithms beyond SGD, without compromising the generalization error.
no code implementations • 10 Jun 2020 • Simon S. Du, Wei Hu, Zhiyuan Li, Ruoqi Shen, Zhao Song, Jiajun Wu
Though errors in past actions may affect the future, we are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.
no code implementations • 16 Apr 2020 • Zhao Song, David P. Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • 8 Apr 2020 • Haotian Jiang, Yin Tat Lee, Zhao Song, Sam Chiu-wai Wong
We propose a new cutting plane algorithm that uses an optimal $O(n \log (\kappa))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $\kappa = nR/\epsilon$.
no code implementations • 4 Mar 2020 • Yangsibo Huang, Yushan Su, Sachin Ravi, Zhao Song, Sanjeev Arora, Kai Li
This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility.
no code implementations • 23 Feb 2020 • Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
We show that our approach obtains small error and is efficient in both space and time.
no code implementations • ICML 2020 • Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh
In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data.
no code implementations • NeurIPS 2020 • Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora
Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation.
no code implementations • ICLR 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
no code implementations • 16 Dec 2019 • Sitan Chen, Jerry Li, Zhao Song
In this paper, we give the first algorithm for learning an MLR that runs in time which is sub-exponential in $k$.
4 code implementations • ICML 2020 • Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.
Ranked #13 on
Speech Synthesis
on LibriTTS
1 code implementation • NeurIPS 2019 • Zhao Song, David Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • NeurIPS 2019 • Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon
Inductive matrix completion (IMC) method is a standard approach for this problem where the given query as well as the items are embedded in a common low-dimensional space.
no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong
When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.
no code implementations • NeurIPS 2019 • Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.
1 code implementation • NeurIPS 2019 • Huaian Diao, Zhao Song, David P. Woodruff, Xin Yang
In the total least squares problem, one is given an $m \times n$ matrix $A$, and an $m \times d$ matrix $B$, and one seeks to "correct" both $A$ and $B$, obtaining matrices $\hat{A}$ and $\hat{B}$, so that there exists an $X$ satisfying the equation $\hat{A}X = \hat{B}$.
no code implementations • 9 Jun 2019 • Zhao Song, Xin Yang
We improve the over-parametrization size over two beautiful results [Li and Liang' 2018] and [Du, Zhai, Poczos and Singh' 2019] in deep learning theory.
2 code implementations • ICML 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
no code implementations • 11 May 2019 • Yin Tat Lee, Zhao Song, Qiuyi Zhang
Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems.
1 code implementation • 1 May 2019 • Zhao Song, Wen Sun
Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15].
no code implementations • ICLR 2019 • Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh
In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network.
no code implementations • 26 Dec 2018 • Yibo Lin, Zhao Song, Lin F. Yang
In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets.
no code implementations • 15 Dec 2018 • Yin Tat Lee, Zhao Song, Santosh S. Vempala
We apply this to the sampling problem to obtain a nearly linear implementation of HMC for a broad class of smooth, strongly logconcave densities, with the number of iterations (parallel depth) and gradient evaluations being $\mathit{polylogarithmic}$ in the dimension (rather than polynomial as in previous work).
2 code implementations • 2 Dec 2018 • Zhao Song, Ronald E. Parr, Lawrence Carin
The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator.
no code implementations • 9 Nov 2018 • Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song
In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).
1 code implementation • NeurIPS 2019 • Zhao Song, David P. Woodruff, Peilin Zhong
Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
no code implementations • NeurIPS 2019 • Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song
In this paper, we focus on recurrent neural networks (RNNs) which are multi-layer networks widely used in natural language processing.
no code implementations • 26 May 2018 • Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon
A standard approach to modeling this problem is Inductive Matrix Completion where the predicted rating is modeled as an inner product of the user and the item features projected onto a latent space.
6 code implementations • ICML 2018 • Tsui-Wei Weng, huan zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, Luca Daniel
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17].
2 code implementations • ICML 2018 • Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon
In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its training while giving us stronger expressive power.
no code implementations • 1 Feb 2018 • Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong
We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset.
no code implementations • 27 Dec 2017 • Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.
no code implementations • 25 Dec 2017 • David Liau, Eric Price, Zhao Song, Ger Yang
We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms.
no code implementations • NeurIPS 2017 • Zhao Song, Yusuke Muraoka, Ryohei Fujimaki, Lawrence Carin
We propose a scalable algorithm for model selection in sigmoid belief networks (SBNs), based on the factorized asymptotic Bayesian (FAB) framework.
no code implementations • 8 Nov 2017 • Kai Zhong, Zhao Song, Inderjit S. Dhillon
In this paper, we consider parameter recovery for non-overlapping convolutional neural networks (CNNs) with multiple kernels.
no code implementations • ICML 2017 • Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon
For activation functions that are also smooth, we show $\mathit{local~linear~convergence}$ guarantees of gradient descent under a resampling rule.
no code implementations • 30 May 2017 • Eric Price, Zhao Song, David P. Woodruff
Our main result is that, when $S$ is the subsampled randomized Fourier/Hadamard transform, the error $x' - x^*$ behaves as if it lies in a "random" direction within this bound: for any fixed direction $a\in \mathbb{R}^d$, we have with $1 - d^{-c}$ probability that \[ \langle a, x'-x^*\rangle \lesssim \frac{\|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, \quad (1) \] where $c, \gamma > 0$ are arbitrary constants.
no code implementations • 26 Apr 2017 • Zhao Song, David P. Woodruff, Peilin Zhong
Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.
1 code implementation • NeurIPS 2016 • Zhao Song, David Woodruff, huan zhang
We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i. e., even without reading most of the input tensor.
no code implementations • NeurIPS 2016 • Zhao Song, Ronald E. Parr, Xuejun Liao, Lawrence Carin
We then develop a supervised linear feature encoding method that is motivated by insights from linear value function approximation theory, as well as empirical successes from deep RL.
no code implementations • 3 Nov 2016 • Zhao Song, David P. Woodruff, Peilin Zhong
We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.
no code implementations • 5 Sep 2012 • Zhao Song, Aleksandar Dogandzic
Our signal reconstruction scheme is based on an EM iteration that aims at maximizing the posterior distribution of the signal and its state variables given the noise variance.