Ternary Weight Splitting is a ternarization approach used in BinaryBERT that exploits the flatness of ternary loss landscape as the optimization proxy of the binary model. We first train the half-sized ternary BERT to convergence, and then split both the latent full-precision weight $\mathbf{w}^{t}$ and quantized $\hat{\mathbf{w}}^{t}$ to their binary counterparts $\mathbf{w}_{1}^{b}, \mathbf{w}_{2}^{b}$ and $\hat{\mathbf{w}}_{1}^{b}, \hat{\mathbf{w}}_{2}^{b}$ via the TWS operator. To inherit the performance of the ternary model after splitting, the TWS operator requires the splitting equivalency (i.e., the same output given the same input):
$$ \mathbf{w}^{t}=\mathbf{w}_{1}^{b}+\mathbf{w}_{2}^{b}, \quad \hat{\mathbf{w}}^{t}=\hat{\mathbf{w}}_{1}^{b}+\hat{\mathbf{w}}_{2}^{b} $$
While solution to the above equation is not unique, we constrain the latent full-precision weights after splitting $\mathbf{w}_{1}^{b}, \mathbf{w}_{2}^{b}$ to satisfy $\mathbf{w}^{t}=\mathbf{w}_{1}^{b}+\mathbf{w}_{2}^{b}$. See the paper for more details.
Source: BinaryBERT: Pushing the Limit of BERT QuantizationPaper | Code | Results | Date | Stars |
---|
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |