Ternary Weight Splitting

Introduced by Bai et al. in BinaryBERT: Pushing the Limit of BERT Quantization

Ternary Weight Splitting is a ternarization approach used in BinaryBERT that exploits the flatness of ternary loss landscape as the optimization proxy of the binary model. We first train the half-sized ternary BERT to convergence, and then split both the latent full-precision weight $\mathbf{w}^{t}$ and quantized $\hat{\mathbf{w}}^{t}$ to their binary counterparts $\mathbf{w}_{1}^{b}, \mathbf{w}_{2}^{b}$ and $\hat{\mathbf{w}}_{1}^{b}, \hat{\mathbf{w}}_{2}^{b}$ via the TWS operator. To inherit the performance of the ternary model after splitting, the TWS operator requires the splitting equivalency (i.e., the same output given the same input):

$$ \mathbf{w}^{t}=\mathbf{w}_{1}^{b}+\mathbf{w}_{2}^{b}, \quad \hat{\mathbf{w}}^{t}=\hat{\mathbf{w}}_{1}^{b}+\hat{\mathbf{w}}_{2}^{b} $$

While solution to the above equation is not unique, we constrain the latent full-precision weights after splitting $\mathbf{w}_{1}^{b}, \mathbf{w}_{2}^{b}$ to satisfy $\mathbf{w}^{t}=\mathbf{w}_{1}^{b}+\mathbf{w}_{2}^{b}$. See the paper for more details.

Source: BinaryBERT: Pushing the Limit of BERT Quantization

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Model Compression	1	50.00%
Quantization	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Ternarization