no code implementations • 5 Apr 2024 • Shuo Xie, Zhiyuan Li
Adam with decoupled weight decay, also known as AdamW, is widely acclaimed for its superior performance in language modeling tasks, surpassing Adam with $\ell_2$ regularization in terms of generalization and optimization.
1 code implementation • 10 Aug 2023 • Siqiao Xue, Fan Zhou, Yi Xu, Ming Jin, Qingsong Wen, Hongyan Hao, Qingyang Dai, Caigao Jiang, Hongyu Zhao, Shuo Xie, Jianshan He, James Zhang, Hongyuan Mei
We present WeaverBird, an intelligent dialogue system designed specifically for the finance domain.
no code implementations • 18 Oct 2022 • Shuo Xie, Jiahao Qiu, Ankita Pasad, Li Du, Qing Qu, Hongyuan Mei
We propose to select layers based on the variability of their hidden states given a task-specific corpus.