Ternarization

Ternary Weight Splitting

Introduced by Bai et al. in BinaryBERT: Pushing the Limit of BERT Quantization

Ternary Weight Splitting is a ternarization approach used in BinaryBERT that exploits the flatness of ternary loss landscape as the optimization proxy of the binary model. We first train the half-sized ternary BERT to convergence, and then split both the latent full-precision weight $\mathbf{w}^{t}$ and quantized $\hat{\mathbf{w}}^{t}$ to their binary counterparts $\mathbf{w}_{1}^{b}, \mathbf{w}_{2}^{b}$ and $\hat{\mathbf{w}}_{1}^{b}, \hat{\mathbf{w}}_{2}^{b}$ via the TWS operator. To inherit the performance of the ternary model after splitting, the TWS operator requires the splitting equivalency (i.e., the same output given the same input):

$$ \mathbf{w}^{t}=\mathbf{w}_{1}^{b}+\mathbf{w}_{2}^{b}, \quad \hat{\mathbf{w}}^{t}=\hat{\mathbf{w}}_{1}^{b}+\hat{\mathbf{w}}_{2}^{b} $$

While solution to the above equation is not unique, we constrain the latent full-precision weights after splitting $\mathbf{w}_{1}^{b}, \mathbf{w}_{2}^{b}$ to satisfy $\mathbf{w}^{t}=\mathbf{w}_{1}^{b}+\mathbf{w}_{2}^{b}$. See the paper for more details.

Source: BinaryBERT: Pushing the Limit of BERT Quantization

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Model Compression 1 50.00%
Quantization 1 50.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories