no code implementations • 26 May 2024 • Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers
The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i. e., experts, through trainable routers.
1 code implementation • 7 Jun 2023 • Mohammed Nowaz Rabbani Chowdhury, Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen
In deep learning, mixture-of-experts (MoE) activates one or few experts (sub-networks) on a per-sample or per-token basis, resulting in significant computation reduction.