ConvBERT is a modification on the BERT architecture which uses a span-based dynamic convolution to replace self-attention heads to directly model local dependencies. Specifically a new mixed attention module replaces the self-attention modules in BERT, which leverages the advantages of convolution to better capture local dependency. Additionally, a new span-based dynamic convolution operation is used to utilize multiple input tokens to dynamically generate the convolution kernel. Lastly, ConvBERT also incorporates some new model designs including the bottleneck attention and grouped linear operator for the feed-forward module (reducing the number of parameters).
Source: ConvBERT: Improving BERT with Span-based Dynamic ConvolutionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Bias Detection | 1 | 20.00% |
Automatic Speech Recognition (ASR) | 1 | 20.00% |
Punctuation Restoration | 1 | 20.00% |
Speech Recognition | 1 | 20.00% |
Natural Language Understanding | 1 | 20.00% |