no code implementations • 8 Dec 2024 • Yekun Ke, YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
The application of transformer-based models on time series forecasting (TSF) tasks has long been popular to study.
no code implementations • 3 Nov 2024 • Majid Daliri, Zhao Song, Chiwun Yang
Research by Wang et al. (2023); Ma et al. (2024) indicates that the performance of these 1-bit LLMs progressively improves as the number of parameters increases, hinting at the potential existence of a Scaling Law for 1-bit Neural Networks.
1 code implementation • 20 Jun 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
Prompting and context-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks.
no code implementations • 3 Apr 2024 • Yichuan Deng, Zhao Song, Chiwun Yang
The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.
no code implementations • 2 Feb 2024 • Yichuan Deng, Zhao Song, Chiwun Yang
Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.
no code implementations • 24 Nov 2023 • Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang
Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$.
no code implementations • 22 Nov 2023 • Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang
The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients.
no code implementations • 19 Oct 2023 • Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang
In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.
no code implementations • 17 Oct 2023 • Zhao Song, Chiwun Yang
The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates.
no code implementations • 5 Oct 2023 • Timothy Chu, Zhao Song, Chiwun Yang
To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning).
1 code implementation • 23 Aug 2023 • Timothy Chu, Zhao Song, Chiwun Yang
Large language models (LLMs) and generative AI have played a transformative role in computer research and applications.