BinaryBERT is a BERT-variant that applies quantization in the form of weight binarization. Specifically, ternary weight splitting is proposed which initializes BinaryBERT by equivalently splitting from a half-sized ternary network. To obtain BinaryBERT, we first train a half-sized ternary BERT model, and then apply a ternary weight splitting operator to obtain the latent full-precision and quantized weights as the initialization of the full-sized BinaryBERT. We then fine-tune BinaryBERT for further refinement.
Source: BinaryBERT: Pushing the Limit of BERT QuantizationPaper | Code | Results | Date | Stars |
---|