Language Models

Bort

Introduced by Wynter et al. in Optimal Subarchitecture Extraction For BERT

Bort is a parametric architectural variant of the BERT architecture. It extracts an optimal subset of architectural parameters for the BERT architecture through a neural architecture search approach; in particular, a fully polynomial-time approximation scheme (FPTAS). This optimal subset - “Bort” - is demonstrably smaller, having an effective size of $5.5 \%$ the original BERT-large architecture, and $16\%$ of the net size. Bort is also able to be pretrained in $288$ GPU hours, which is $1.2\%$ less than the time required to pretrain the highest-performing BERT parametric architecture variant, RoBERTa-large (RoBERTa), and about $33\%

Source: Optimal Subarchitecture Extraction For BERT

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Denoising 2 66.67%
Natural Language Understanding 1 33.33%

Components


Component Type
BERT
Language Models

Categories