1 code implementation • 27 Oct 2022 • Bowen Shen, Zheng Lin, Yuanxin Liu, Zhengxiao Liu, Lei Wang, Weiping Wang
Motivated by such considerations, we propose a collaborative optimization for PLMs that integrates static model compression and dynamic inference acceleration.
no code implementations • 26 Oct 2022 • Qingyi Si, Yuanxin Liu, Zheng Lin, Peng Fu, Weiping Wang
To facilitate the application of VLP to VQA tasks, it is imperative to jointly study VLP compression and OOD robustness, which, however, has not yet been explored.
1 code implementation • 11 Oct 2022 • Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance.
1 code implementation • 10 Oct 2022 • Qingyi Si, Fandong Meng, Mingyu Zheng, Zheng Lin, Yuanxin Liu, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
To overcome this limitation, we propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets.
1 code implementation • 10 Oct 2022 • Qingyi Si, Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples).
1 code implementation • NAACL 2022 • Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability.
1 code implementation • ACL 2021 • Yuanxin Liu, Fandong Meng, Zheng Lin, Weiping Wang, Jie zhou
In this paper, however, we observe that although distilling the teacher's hidden state knowledge (HSK) is helpful, the performance gain (marginal utility) diminishes quickly as more HSK is distilled.
1 code implementation • 21 Mar 2021 • Yuanxin Liu, Zheng Lin, Fengcheng Yuan
Based on the empirical findings, our best compressed model, dubbed Refined BERT cOmpreSsion with InTegrAted techniques (ROSITA), is $7. 5 \times$ smaller than BERT while maintains $98. 5\%$ of the performance on five tasks of the GLUE benchmark, outperforming the previous BERT compression methods with similar parameter budget.
1 code implementation • 3 Dec 2020 • Qingyi Si, Yuanxin Liu, Peng Fu, Zheng Lin, Jiangnan Li, Weiping Wang
A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage.
no code implementations • 28 Feb 2020 • Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu sun
Recently, attention-based encoder-decoder models have been used extensively in image captioning.
no code implementations • 13 Nov 2019 • Yuanxin Liu, Zheng Lin
They are classified into architecture-based methods and strategy-based methods, based on their way of handling the above obstacle.
no code implementations • IJCNLP 2019 • Yanfu Xu, Zheng Lin, Yuanxin Liu, Rui Liu, Weiping Wang, Dan Meng
Open-domain question answering (OpenQA) aims to answer questions based on a number of unlabeled paragraphs.
no code implementations • CONLL 2019 • Fenglin Liu, Meng Gao, Yuanxin Liu, Kai Lei
Residual has been widely applied to build deep neural networks with enhanced feature propagation and improved accuracy.
1 code implementation • NeurIPS 2019 • Fenglin Liu, Yuanxin Liu, Xuancheng Ren, Xiaodong He, Xu sun
In vision-and-language grounding problems, fine-grained representations of the image are considered to be of paramount importance.
1 code implementation • EMNLP 2018 • Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Houfeng Wang, Xu sun
The encode-decoder framework has shown recent success in image captioning.