The Routing Transformer is a Transformer that endows self-attention with a sparse routing module based on online k-means. Each attention module considers a clustering of the space: the current timestep only attends to context belonging to the same cluster. In other word, the current time-step query is routed to a limited number of context through its cluster assignment.
Source: Efficient Content-Based Sparse Attention with Routing TransformersPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Zero-Shot Learning | 1 | 12.50% |
Long Form Question Answering | 1 | 12.50% |
Open-Domain Dialog | 1 | 12.50% |
Open-Domain Question Answering | 1 | 12.50% |
Question Answering | 1 | 12.50% |
Text Generation | 1 | 12.50% |
Image Generation | 1 | 12.50% |
Language Modelling | 1 | 12.50% |