no code implementations • ICML 2020 • Zonghan Yang, Yang Liu, Chenglong Bao, Zuoqiang Shi
Although ordinary differential equations (ODEs) provide insights for designing networks architectures, its relationship with the non-residual convolutional neural networks (CNNs) is still unclear.
no code implementations • 20 Feb 2024 • An Liu, Zonghan Yang, Zhenhe Zhang, Qingyuan Hu, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu
While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models.
1 code implementation • 19 Feb 2024 • Xuanyu Lei, Zonghan Yang, Xinrui Chen, Peng Li, Yang Liu
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional capabilities in vision-language tasks.
no code implementations • 17 Feb 2024 • Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che
Model quantification uses low bit-width values to represent the weight matrices of models, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs.
no code implementations • 12 Feb 2024 • Zonghan Yang, An Liu, Zijun Liu, Kaiming Liu, Fangzhou Xiong, Yile Wang, Zeyuan Yang, Qingyuan Hu, Xinrui Chen, Zhenhe Zhang, Fuwen Luo, Zhicheng Guo, Peng Li, Yang Liu
We also conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints.
no code implementations • 29 Nov 2023 • Xiaoyue Mi, Fan Tang, Zonghan Yang, Danding Wang, Juan Cao, Peng Li, Yang Liu
Despite the remarkable advances that have been made in continual learning, the adversarial vulnerability of such methods has not been fully discussed.
1 code implementation • 15 Jun 2023 • Qinhong Zhou, Zonghan Yang, Peng Li, Yang Liu
By combining the theoretical and empirical estimations of the decision distributions together, the estimation of logits can be successfully reduced to a simple root-finding problem.
1 code implementation • 4 Jun 2023 • Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun
Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters.
1 code implementation • 2 Jun 2023 • Zonghan Yang, Tianyu Pang, Yang Liu
Deep equilibrium models (DEQs) refrain from the traditional layer-stacking paradigm and turn to find the fixed point of a single layer.
1 code implementation • 2 Jun 2023 • Zonghan Yang, Peng Li, Tianyu Pang, Yang Liu
To this end, we interpret DEQs through the lens of neural dynamics and find that AT under-regulates intermediate states.
no code implementations • 28 Jan 2023 • Zeyuan Yang, Zonghan Yang, Peng Li, Yang Liu
The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge.
no code implementations • 10 Oct 2022 • Zonghan Yang, Xiaoyuan Yi, Peng Li, Yang Liu, Xing Xie
Warning: this paper contains model outputs exhibiting offensiveness and biases.
1 code implementation • ICLR 2022 • Zonghan Yang, Yang Liu
Recently, prefix-tuning has gained increasing attention as a parameter-efficient finetuning method for large-scale pretrained language models.
1 code implementation • 14 Mar 2022 • Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang, Juanzi Li, Maosong Sun
This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper.
no code implementations • Findings (ACL) 2021 • Rui Jiao, Zonghan Yang, Maosong Sun, Yang Liu
In this work, we propose alternated training with synthetic and authentic data for NMT.
no code implementations • 1 Jan 2021 • Zonghan Yang, Yang Liu, Chenglong Bao, Zuoqiang Shi
Deep neural networks are observed to be fragile against adversarial attacks, which have dramatically limited their practical applicability.
no code implementations • 31 Dec 2020 • Zhixing Tan, Shuo Wang, Zonghan Yang, Gang Chen, Xuancheng Huang, Maosong Sun, Yang Liu
Machine translation (MT) is an important sub-field of natural language processing that aims to translate natural languages using computers.
1 code implementation • 10 Jun 2020 • Zonghan Yang, Yang Liu, Chenglong Bao, Zuoqiang Shi
Although ordinary differential equations (ODEs) provide insights for designing network architectures, its relationship with the non-residual convolutional neural networks (CNNs) is still unclear.
no code implementations • ACL 2019 • Zonghan Yang, Yong Cheng, Yang Liu, Maosong Sun
While neural machine translation (NMT) has achieved remarkable success, NMT systems are prone to make word omission errors.
1 code implementation • 12 Sep 2018 • Xiaoyuan Yi, Maosong Sun, Ruoyu Li, Zonghan Yang
Different from previous methods, our model explicitly maintains topics and informative limited history in a neural memory.