2 code implementations • 16 Dec 2024 • Jingze Shi, Bingheng Wu
In order to make the foundation model more efficient and effective, our idea is combining sequence transformation and state transformation.
no code implementations • 24 Jul 2024 • Jingze Shi, Bingheng Wu, Lu He, Luchang Jiang
We prove the availability of inner product form position encoding in the state space dual algorithm and study the effectiveness of different position embeddings in the hybrid quadratic causal self-attention and state space dual algorithms.
1 code implementation • 24 Jun 2024 • Jingze Shi, Ting Xie, Bingheng Wu, Chunjun Zheng, Kai Wang
Recent research has shown that combining Mamba with Transformer architecture, which has selective state space and quadratic self-attention mechanism, outperforms using Mamba or Transformer architecture alone in language modeling tasks.