Search Results for author: Mitchell A. Gordon

Found 3 papers, 1 papers with code

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

no code implementations • WS 2020 • Mitchell A. Gordon, Kevin Duh

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting.

Domain Adaptation Knowledge Distillation +2

Paper
Add Code

Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

1 code implementation • ACL 2020 • Mitchell A. Gordon, Kevin Duh, Nicholas Andrews

Low levels of pruning (30-40%) do not affect pre-training loss or transfer to downstream tasks at all.

Transfer Learning

Paper
Code

Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation

no code implementations • 6 Dec 2019 • Mitchell A. Gordon, Kevin Duh

We then propose an alternative hypothesis under the lens of data augmentation and regularization.

Data Augmentation Knowledge Distillation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.