Mixture-of-Experts

464 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Mixture-of-Experts models and implementations
4 papers
38,334
3 papers
475
See all 9 libraries.

Most implemented papers

Distilling the Knowledge in a Neural Network

labmlai/annotated_deep_learning_paper_implementations 9 Mar 2015

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions.

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

shenweichen/DeepCTR 19 Jul 2018

In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data.

Gated Multimodal Units for Information Fusion

johnarevalo/gmu-mmimdb 7 Feb 2017

The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities.

No Language Left Behind: Scaling Human-Centered Machine Translation

facebookresearch/fairseq Meta AI 2022

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today.

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

tensorflow/mesh 11 Jan 2021

We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.

Qwen2 Technical Report

qwenlm/qwen2 15 Jul 2024

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.

Qwen2.5 Technical Report

qwenlm/qwen2.5 19 Dec 2024

In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2. 5-Turbo and Qwen2. 5-Plus, both available from Alibaba Cloud Model Studio.

Mixtral of Experts

hit-scir/chinese-mixtral-8x7b 8 Jan 2024

In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

davidmrau/mixture-of-experts 23 Jan 2017

In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters.

Robust Federated Learning by Mixture of Experts

etesami/MOE-FL 23 Apr 2021

We present a novel weighted average model based on the mixture of experts (MoE) concept to provide robustness in Federated learning (FL) against the poisoned/corrupted/outdated local models.