Search Results for author: Chiwun Yang

Found 11 papers, 2 papers with code

Unlocking the Theory Behind Scaling 1-Bit Neural Networks

no code implementations3 Nov 2024 Majid Daliri, Zhao Song, Chiwun Yang

Research by Wang et al. (2023); Ma et al. (2024) indicates that the performance of these 1-bit LLMs progressively improves as the number of parameters increases, hinting at the potential existence of a Scaling Law for 1-bit Neural Networks.

Towards Infinite-Long Prefix in Transformer

1 code implementation20 Jun 2024 YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

Prompting and context-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks.

Math parameter-efficient fine-tuning

Attention is Naturally Sparse with Gaussian Distributed Input

no code implementations3 Apr 2024 Yichuan Deng, Zhao Song, Chiwun Yang

The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.

Computational Efficiency

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

no code implementations2 Feb 2024 Yichuan Deng, Zhao Song, Chiwun Yang

Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.

Stochastic Optimization

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

no code implementations24 Nov 2023 Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang

Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$.

Attribute

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

no code implementations22 Nov 2023 Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang

The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients.

Privacy Preserving

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

no code implementations19 Oct 2023 Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

no code implementations17 Oct 2023 Zhao Song, Chiwun Yang

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates.

Scheduling

Fine-tune Language Models to Approximate Unbiased In-context Learning

no code implementations5 Oct 2023 Timothy Chu, Zhao Song, Chiwun Yang

To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning).

In-Context Learning

How to Protect Copyright Data in Optimization of Large Language Models?

1 code implementation23 Aug 2023 Timothy Chu, Zhao Song, Chiwun Yang

Large language models (LLMs) and generative AI have played a transformative role in computer research and applications.

Language Modelling Large Language Model +1

Cannot find the paper you are looking for? You can Submit a new open access paper.