TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Domain Generalization	DomainNet	Hybrid-SF-MoE	Average Accuracy	52.0	# 11
Domain Generalization	DomainNet	GMoE-S/16	Average Accuracy	48.7	# 15
Domain Generalization	Office-Home	GMoE-S/16	Average Accuracy	74.2	# 15
Domain Generalization	PACS	GMoE-S/16	Average Accuracy	88.1	# 24
Domain Generalization	TerraIncognita	GMoE-S/16	Average Accuracy	48.5	# 20
Domain Generalization	VLCS	GMoE-S/16	Average Accuracy	80.2	# 14

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-fusion-mixture-of-experts-are-domain/domain-generalization-on-domainnet)](https://paperswithcode.com/sota/domain-generalization-on-domainnet?p=sparse-fusion-mixture-of-experts-are-domain)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-fusion-mixture-of-experts-are-domain/domain-generalization-on-vlcs)](https://paperswithcode.com/sota/domain-generalization-on-vlcs?p=sparse-fusion-mixture-of-experts-are-domain)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-fusion-mixture-of-experts-are-domain/domain-generalization-on-office-home)](https://paperswithcode.com/sota/domain-generalization-on-office-home?p=sparse-fusion-mixture-of-experts-are-domain)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-fusion-mixture-of-experts-are-domain/domain-generalization-on-terraincognita)](https://paperswithcode.com/sota/domain-generalization-on-terraincognita?p=sparse-fusion-mixture-of-experts-are-domain)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-fusion-mixture-of-experts-are-domain/domain-generalization-on-pacs-2)](https://paperswithcode.com/sota/domain-generalization-on-pacs-2?p=sparse-fusion-mixture-of-experts-are-domain)`

Sparse Mixture-of-Experts are Domain Generalizable Learners

8 Jun 2022 · Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu ·

Human visual perception can easily generalize to out-of-distributed visual data, which is far beyond the capability of modern machine learning models. Domain generalization (DG) aims to close this gap, with existing DG methods mainly focusing on the loss function design. In this paper, we propose to explore an orthogonal direction, i.e., the design of the backbone architecture. It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets. We develop a formal framework to characterize a network's robustness to distribution shifts by studying its architecture's alignment with the correlations in the dataset. This analysis guides us to propose a novel DG model built upon vision transformers, namely Generalizable Mixture-of-Experts (GMoE). Extensive experiments on DomainBed demonstrate that GMoE trained with ERM outperforms SOTA DG baselines by a large margin. Moreover, GMoE is complementary to existing DG methods and its performance is substantially improved when trained with DG algorithms.

PDF Abstract

Code

Add Remove Mark official

luodian/sf-moe-dg official

280

Tasks

Add Remove

Domain Generalization

Object Recognition

Datasets

Office-Home

DomainNet

PACS

Wilds

fMoW VLCS SVIRO

Results from the Paper

Edit

Ranked #11 on Domain Generalization on DomainNet (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Domain Generalization	DomainNet	Hybrid-SF-MoE	Average Accuracy	52.0	# 11	Compare
Domain Generalization	DomainNet	GMoE-S/16	Average Accuracy	48.7	# 15	Compare
Domain Generalization	Office-Home	GMoE-S/16	Average Accuracy	74.2	# 15	Compare
Domain Generalization	PACS	GMoE-S/16	Average Accuracy	88.1	# 24	Compare
Domain Generalization	TerraIncognita	GMoE-S/16	Average Accuracy	48.5	# 20	Compare
Domain Generalization	VLCS	GMoE-S/16	Average Accuracy	80.2	# 14	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Sparse Mixture-of-Experts are Domain Generalizable Learners

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove