no code implementations • 26 May 2024 • Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers
The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i. e., experts, through trainable routers.