1 code implementation • 5 Feb 2024 • Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang
Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training.
1 code implementation • 6 Jun 2022 • Yunsheng Ni, Depu Meng, Changqian Yu, Chengbin Quan, Dongchun Ren, Youjian Zhao
Specifically, we first capture the different representations with different augmentations, then regularize the cosine distance of the representations to enhance the consistency.