I-BERT

Introduced by Kim et al. in I-BERT: Integer-only BERT Quantization

I-BERT is a quantized version of BERT that quantizes the entire inference with integer-only arithmetic. Based on lightweight integer only approximation methods for nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, it performs an end-to-end integer-only BERT inference without any floating point calculation.

In particular, GELU and Softmax are approximated with lightweight second-order polynomials, which can be evaluated with integer-only arithmetic. For LayerNorm, integer-only computation is performed by leveraging a known algorithm for integer calculation of square root.

Source: I-BERT: Integer-only BERT Quantization

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Natural Language Inference	1	33.33%
Natural Language Understanding	1	33.33%
Quantization	1	33.33%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Transformers

Autoencoding Transformers