no code implementations • WS 2020 • Mitchell A. Gordon, Kevin Duh
We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting.
1 code implementation • ACL 2020 • Mitchell A. Gordon, Kevin Duh, Nicholas Andrews
Low levels of pruning (30-40%) do not affect pre-training loss or transfer to downstream tasks at all.
no code implementations • 6 Dec 2019 • Mitchell A. Gordon, Kevin Duh
We then propose an alternative hypothesis under the lens of data augmentation and regularization.