Search Results for author: Kevin Christian Wibisono

Found 1 papers, 1 papers with code

Bidirectional Attention as a Mixture of Continuous Word Experts

1 code implementation8 Jul 2023 Kevin Christian Wibisono, Yixin Wang

The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights.

Language Modelling Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.