Language Models

DynaBERT

Introduced by Hou et al. in DynaBERT: Dynamic BERT with Adaptive Width and Depth

DynaBERT is a BERT-variant which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks.

A two-stage procedure is used to train DynaBERT. First, using knowledge distillation (dashed lines) to transfer the knowledge from a fixed teacher model to student sub-networks with adaptive width in DynaBERTW. Then, using knowledge distillation (dashed lines) to transfer the knowledge from a trained DynaBERTW to student sub-networks with adaptive width and depth in DynaBERT.

Source: DynaBERT: Dynamic BERT with Adaptive Width and Depth

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 2 66.67%
Model Compression 1 33.33%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories