1 code implementation • 26 Oct 2024 • Xiongtao Xiao, Xiaofeng Chen, Feiyan Jiang, Songming Zhang, Wenming Cao, Cheng Tan, Zhangyang Gao, Zhongshan Li
Such assumption typically results in graph structures that prioritize local spatial information while overlooking global patterns, limiting the ability to fully capture the broader structural features of biological tissues.
1 code implementation • 25 Jun 2024 • Songming Zhang, Xue Zhang, Zengkui Sun, Yufeng Chen, Jinan Xu
Furthermore, this discrepancy also hinders the KD process between models with different vocabularies, which is common for current LLMs.
1 code implementation • 24 Jun 2024 • Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie zhou
To address this issue, we first investigate how LLMs process multilingual factual knowledge and discover that the same factual knowledge in different languages generally activates a shared set of neurons, which we call language-agnostic factual neurons (LAFNs).
no code implementations • 25 Dec 2023 • Songming Zhang, Yuxiao Luo, Qizhou Wang, Haoang Chi, Xiaofeng Chen, Bo Han, Jinyan Li
Deep neural networks often face generalization problems to handle out-of-distribution (OOD) data, and there remains a notable theoretical gap between the contributing factors and their respective impacts.
1 code implementation • 25 Dec 2023 • Songming Zhang, Ziyu Lyu, Xiaofeng Chen
Knowledge distillation transfers knowledge from large models into small models, and has recently made remarkable achievements.
1 code implementation • 20 Oct 2023 • Xue Zhang, Songming Zhang, Yunlong Liang, Yufeng Chen, Jian Liu, Wenjuan Han, Jinan Xu
Furthermore, for situations requiring multiple paraphrases for each source sentence, we design a Diverse Templates Search (DTS) algorithm, which can enhance the diversity between paraphrases without sacrificing quality.
1 code implementation • 14 May 2023 • Songming Zhang, Yunlong Liang, Shuaibo Wang, Wenjuan Han, Jian Liu, Jinan Xu, Yufeng Chen
In this work, we first unravel this mystery from an empirical perspective and show that the knowledge comes from the top-1 predictions of teachers, which also helps us build a potential connection between word- and sequence-level KD.
no code implementations • 12 Oct 2022 • Hongxiao Zhang, Siyu Lai, Songming Zhang, Hui Huang, Yufeng Chen, Jinan Xu, Jian Liu
This paper introduces the system used in our submission to the WMT'22 Translation Suggestion shared task.
1 code implementation • ACL 2022 • Songming Zhang, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jian Liu, Jie zhou
Token-level adaptive training approaches can alleviate the token imbalance problem and thus improve neural machine translation, through re-weighting the losses of different target tokens based on specific statistical metrics (e. g., token frequency or mutual information).