1 code implementation • 28 Feb 2025 • Yihong Dong, Ge Li, Xue Jiang, Yongding Tao, Kechi Zhang, Hao Zhu, Huanyu Liu, Jiazheng Ding, Jia Li, Jinliang Deng, Hong Mei
Our pretrained FANformer-1B exhibits marked improvements on downstream tasks compared to open-source LLMs with similar model parameters or training tokens.
no code implementations • 19 Dec 2024 • Lecheng Wang, Xianjie Shi, Ge Li, Jia Li, Xuanming Zhang, Yihong Dong, Wenpin Jiao, Hong Mei
Auto-regressive language models (LMs) have been widely used to generate data in data-scarce domains to train new LMs, compensating for the scarcity of real-world data.
no code implementations • 15 Apr 2024 • Yang Lin, Xinyu Ma, Xu Chu, Yujie Jin, Zhibang Yang, Yasha Wang, Hong Mei
We then demonstrate the theoretical mechanism of our LoRA Dropout mechanism from the perspective of sparsity regularization by providing a generalization error bound under this framework.
no code implementations • 21 Mar 2024 • Yang Yao, Xin Wang, Zeyang Zhang, Yijian Qin, Ziwei Zhang, Xu Chu, Yuekui Yang, Wenwu Zhu, Hong Mei
In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments.
1 code implementation • 6 Sep 2023 • Yuqi Zhu, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, Hong Mei
With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred.
no code implementations • 31 Dec 2021 • Xianglin Yang, Yun Lin, Ruofan Liu, Zhenfeng He, Chao Wang, Jin Song Dong, Hong Mei
Moreover, our case study shows that our visual solution can well reflect the characteristics of various training scenarios, showing good potential of DVI as a debugging tool for analyzing deep learning training processes.
no code implementations • 5 Feb 2021 • Wenjie Chu, Wei zhang, Haiyan Zhao, Zhi Jin, Hong Mei
Self-assembly plays an essential role in many natural processes, involving the formation and evolution of living or non-living structures, and shows potential applications in many emerging domains.
Multiagent Systems Distributed, Parallel, and Cluster Computing Robotics
no code implementations • 7 Dec 2020 • Yan Li, Bo An, Junming Ma, Donggang Cao, Yasha Wang, Hong Mei
Hyper-parameter tuning (HPT) is crucial for many machine learning (ML) algorithms.