1 code implementation • 7 Dec 2023 • Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, Madian Khabsa
Under a unified evaluation of fine-tuned LMs by incorporating four representative perspectives of model robustness, we demonstrate the effectiveness of RoAST compared to state-of-the-art fine-tuning methods on six different types of LMs, which indicates its usefulness in practice.
no code implementations • 6 Nov 2023 • Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Hongbo Zhang, Sung Ju Hwang, Alexander Min
2) The enhanced performance of the larger model further boosts the performance of the smaller model.
2 code implementations • 31 Aug 2023 • Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa
We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs).
1 code implementation • 26 May 2023 • Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Sung Ju Hwang, Alexander Min
Distillation from Weak Teacher (DWT) is a method of transferring knowledge from a smaller, weaker teacher model to a larger student model to improve its performance.
3 code implementations • 25 Jan 2023 • Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa
Large multilingual language models typically rely on a single vocabulary shared across 100+ languages.
no code implementations • 12 Oct 2021 • Peng Xu, Davis Liang, Zhiheng Huang, Bing Xiang
We propose a simple strategy to obtain an extractive answer span from the generative model by leveraging the decoder cross-attention patterns.
no code implementations • 27 Sep 2021 • Zhiheng Huang, Davis Liang, Peng Xu, Bing Xiang
Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks.
no code implementations • 26 Nov 2020 • Nicholas Roberts, Davis Liang, Graham Neubig, Zachary C. Lipton
This makes human-level BLEU a misleading benchmark in that modern MT systems cannot approach human-level BLEU while simultaneously maintaining human-level translation diversity.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhiheng Huang, Davis Liang, Peng Xu, Bing Xiang
In this paper, we first review absolute position embeddings and existing methods for relative position embeddings.
1 code implementation • 22 Sep 2020 • Davis Liang, Peng Xu, Siamak Shakeri, Cicero Nogueira dos Santos, Ramesh Nallapati, Zhiheng Huang, Bing Xiang
In some cases, our model trained on synthetic data can even outperform the same model trained on real data
no code implementations • 16 Mar 2020 • Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, Bing Xiang
Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering.
Ranked #1 on
Text Classification
on GLUE MRPC
6 code implementations • ACL 2020 • Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff
Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one.
no code implementations • 17 Jul 2018 • Davis Liang, Zhiheng Huang, Zachary C. Lipton
Despite rapid advances in speech recognition, current models remain brittle to superficial perturbations to their inputs.
no code implementations • IJCNLP 2017 • Davis Liang, Yan Shu
Our results show that training on a primary task in parallel with a secondary automated task improves both the convergence speed and accuracy for the primary task.