1 code implementation • 22 Aug 2024 • Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu
However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM).
no code implementations • 8 Aug 2023 • Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Jingbo Zhu
Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters.
1 code implementation • ACL 2022 • Bei Li, Quan Du, Tao Zhou, Yi Jing, Shuhan Zhou, Xin Zeng, Tong Xiao, Jingbo Zhu, Xuebo Liu, Min Zhang
Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE.
no code implementations • 6 Apr 2021 • Bei Li, Quan Du, Tao Zhou, Shuhan Zhou, Xin Zeng, Tong Xiao, Jingbo Zhu
We show that a residual block of layers in Transformer can be described as a higher-order solution to ODEs.
1 code implementation • 27 Dec 2020 • Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu
We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model.
no code implementations • COLING 2020 • Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, ShuJian Huang, Tong Xiao, Jingbo Zhu
Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13. 64~55. 53% between English and four distant languages, i. e., Chinese, Japanese, Vietnamese and Thai.
1 code implementation • EMNLP 2020 • Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu
We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly.
no code implementations • ACL 2021 • Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu
Inspired by this, we investigate methods of model acceleration and compression in another line of research.