TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Multi-Task Learning	OMNIGLOT	Mixture-of-Experts	Average Accuracy	92.19	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diversity-and-depth-in-per-example-routing/multi-task-learning-on-omniglot)](https://paperswithcode.com/sota/multi-task-learning-on-omniglot?p=diversity-and-depth-in-per-example-routing)`

Diversity and Depth in Per-Example Routing Models

ICLR 2019 · Prajit Ramachandran, Quoc V. Le ·

Routing models, a form of conditional computation where examples are routed through a subset of components in a larger network, have shown promising results in recent works. Surprisingly, routing models to date have lacked important properties, such as architectural diversity and large numbers of routing decisions. Both architectural diversity and routing depth can increase the representational power of a routing network. In this work, we address both of these deficiencies. We discuss the significance of architectural diversity in routing models, and explain the tradeoffs between capacity and optimization when increasing routing depth. In our experiments, we find that adding architectural diversity to routing models significantly improves performance, cutting the error rates of a strong baseline by 35% on an Omniglot setup. However, when scaling up routing depth, we find that modern routing techniques struggle with optimization. We conclude by discussing both the positive and negative results, and suggest directions for future research.

PDF Abstract