no code implementations • 10 Feb 2024 • Yeqi Gao, Zhao Song, Ruizhe Zhang
Given its widespread application in machine learning and optimization, the Kronecker product emerges as a pivotal linear algebra operator.
no code implementations • 14 Sep 2023 • Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin
$A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$.
no code implementations • 21 Aug 2023 • Yeqi Gao, Zhao Song, Junze Yin
It is likely that only two types of people would be interested in setting up a practical system for it: $\bullet$ Those who prefer to use a decentralized ChatGPT-like software.
no code implementations • 16 Jul 2023 • Yeqi Gao, Zhao Song, Xin Yang, Ruizhe Zhang
It is well-known that quantum machine has certain computational advantages compared to the classical machine.
no code implementations • 5 Jul 2023 • Yeqi Gao, Zhao Song, Shenghao Xie
Given matrices $A_1 \in \mathbb{R}^{n \times d}$, and $A_2 \in \mathbb{R}^{n \times d}$ and $B \in \mathbb{R}^{n \times n}$, the purpose is to solve some certain optimization problems: Normalized version $\min_{X} \| D(X)^{-1} \exp(A_1 X A_2^\top) - B \|_F^2$ and Rescaled version $\| \exp(A_1 X A_2^\top) - D(X) \cdot B \|_F^2$.
no code implementations • 8 May 2023 • Yeqi Gao, Zhao Song, Xin Yang
Inspired by [Vyas, Kakade and Barak 2023], in this work, we provide a provable result for showing how to differentially private approximate the attention matrix.
no code implementations • 1 May 2023 • Yeqi Gao, Zhao Song, Junze Yin
LLMs have shown great promise in improving the accuracy and efficiency of these tasks, and have the potential to revolutionize the field of natural language processing (NLP) in the years to come.
no code implementations • 13 Apr 2023 • Yichuan Deng, Yeqi Gao, Zhao Song
For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019].
no code implementations • 29 Mar 2023 • Yeqi Gao, Sridhar Mahadevan, Zhao Song
Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function.
no code implementations • 10 Aug 2022 • Yeqi Gao, Lianke Qin, Zhao Song, Yitan Wang
For a neural network of width $m$, $n$ input training data in $d$ dimension, it takes $\Omega(mnd)$ time cost per training iteration for the forward and backward computation.
1 code implementation • 7 Apr 2021 • Ao Zhou, Jianlei Yang, Yeqi Gao, Tong Qiao, Yingjie Qi, Xiaoyi Wang, Yunli Chen, Pengcheng Dai, Weisheng Zhao, Chunming Hu
Graph neural networks (GNN) have achieved state-of-the-art performance on various industrial tasks.