Search Results for author: Yichuan Deng

Found 14 papers, 0 papers with code

Attention is Naturally Sparse with Gaussian Distributed Input

no code implementations3 Apr 2024 Yichuan Deng, Zhao Song, Chiwun Yang

The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.

Computational Efficiency

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

no code implementations2 Feb 2024 Yichuan Deng, Zhao Song, Chiwun Yang

Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.

Stochastic Optimization

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

no code implementations19 Oct 2023 Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.

Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

no code implementations18 Oct 2023 Yichuan Deng, Zhao Song, Tianyi Zhou

Large transformer models have achieved state-of-the-art results in numerous natural language processing tasks.

Clustered Linear Contextual Bandits with Knapsacks

no code implementations21 Aug 2023 Yichuan Deng, Michalis Mamakos, Zhao Song

Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships.

Econometrics Multi-Armed Bandits

Convergence of Two-Layer Regression with Nonlinear Units

no code implementations16 Aug 2023 Yichuan Deng, Zhao Song, Shenghao Xie

Softmax unit and ReLU unit are the key structure in attention computation.

regression

Zero-th Order Algorithm for Softmax Attention Optimization

no code implementations17 Jul 2023 Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song

We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs.

Faster Robust Tensor Power Method for Arbitrary Order

no code implementations1 Jun 2023 Yichuan Deng, Zhao Song, Junze Yin

Tensor decomposition is a fundamental method used in various areas to deal with high-dimensional data.

Tensor Decomposition

Solving Tensor Low Cycle Rank Approximation

no code implementations13 Apr 2023 Yichuan Deng, Yeqi Gao, Zhao Song

For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019].

speech-recognition Speech Recognition

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

no code implementations10 Apr 2023 Yichuan Deng, Sridhar Mahadevan, Zhao Song

It runs in $\widetilde{O}(\mathrm{nnz}(X) + n^{\omega} ) $ time, has $1-\delta$ succeed probability, and chooses $m = O(n \log(n/\delta))$.

Sentence

Streaming Kernel PCA Algorithm With Small Space

no code implementations8 Mar 2023 Yichuan Deng, Zhao Song, Zifan Wang, Han Zhang

The kernel method, which is commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms.

Training Overparametrized Neural Networks in Sublinear Time

no code implementations9 Aug 2022 Yichuan Deng, Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI).

SODA: Site Object Detection dAtaset for Deep Learning in Construction

no code implementations19 Feb 2022 Rui Duan, Hui Deng, Mao Tian, Yichuan Deng, Jiarui Lin

In this manner, this research contributes a large-scale image dataset for the development of deep learning-based object detection methods in the construction industry and sets up a performance benchmark for further evaluation of corresponding algorithms in this area.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.