Search Results for author: Bingheng Wu

Found 3 papers, 2 papers with code

Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture

2 code implementations16 Dec 2024 Jingze Shi, Bingheng Wu

In order to make the foundation model more efficient and effective, our idea is combining sequence transformation and state transformation.

Mixture-of-Experts Position +1

Wonderful Matrices: More Efficient and Effective Architecture for Language Modeling Tasks

no code implementations24 Jul 2024 Jingze Shi, Bingheng Wu, Lu He, Luchang Jiang

We prove the availability of inner product form position encoding in the state space dual algorithm and study the effectiveness of different position embeddings in the hybrid quadratic causal self-attention and state space dual algorithms.

Language Modeling Language Modelling +3

OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

1 code implementation24 Jun 2024 Jingze Shi, Ting Xie, Bingheng Wu, Chunjun Zheng, Kai Wang

Recent research has shown that combining Mamba with Transformer architecture, which has selective state space and quadratic self-attention mechanism, outperforms using Mamba or Transformer architecture alone in language modeling tasks.

Language Modeling Language Modelling +3

Cannot find the paper you are looking for? You can Submit a new open access paper.