DeeBERT is a method for accelerating BERT inference. It inserts extra classification layers (which are referred to as off-ramps) between each transformer layer of BERT. All transformer layers and off-ramps are jointly fine-tuned on a given downstream dataset. At inference time, after a sample goes through a transformer layer, it is passed to the following off-ramp. If the off-ramp is confident of the prediction, the result is returned; otherwise, the sample is sent to the next transformer layer.
Source: DeeBERT: Dynamic Early Exiting for Accelerating BERT InferencePaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Natural Language Inference | 1 | 33.33% |
Paraphrase Identification | 1 | 33.33% |
Natural Language Understanding | 1 | 33.33% |