Early Exiting BERT for Efficient Document Ranking

Pre-trained language models such as BERT have shown their effectiveness in various tasks. Despite their power, they are known to be computationally intensive, which hinders real-world applications. In this paper, we introduce early exiting BERT for document ranking. With a slight modification, BERT becomes a model with multiple output paths, and each inference sample can exit early from these paths. In this way, computation can be effectively allocated among samples, and overall system latency is significantly reduced while the original quality is maintained. Our experiments on two document ranking datasets demonstrate up to 2.5x inference speedup with minimal quality degradation. The source code of our implementation can be found at https://github.com/castorini/earlyexiting-monobert.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here