TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Cross-Domain Document Classification	Amazon cn (Dianping train)	XLMft UDA	Error rate	7.74	# 1
Cross-Domain Document Classification	Amazon en (Yelp train)	XLMft UDA	Error rate	7.57	# 1
Cross-Domain Document Classification	Dianping (Amazon cn train)	XLMft UDA	Error rate	4.64	# 1
Cross-Lingual Sentiment Classification	Dianping (Yelp train)	XLMft UDA	Error rate	4.64	# 1
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-Chinese	XLMft UDA	Accuracy	93.32	# 1
Cross-Lingual Sentiment Classification	MLDoc Zero-Shot English-to-Chinese	XLMft UDA	Error rate	7.74	# 1
Cross-Lingual Sentiment Classification	MLDoc Zero-Shot English-to-French	XLMft UDA	Error rate	5.95	# 1
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-French	XLMft UDA	Accuracy	96.05	# 1
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-German	XLMft UDA	Accuracy	96.95%	# 1
Cross-Lingual Sentiment Classification	MLDoc Zero-Shot English-to-German	XLMft UDA	Error rate	6.12	# 1
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-Russian	XLMft UDA	Accuracy	89.7	# 1
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-Spanish	XLMft UDA	Accuracy	96.8	# 1
Cross-Domain Document Classification	Yelp (Amazon en train)	XLMft UDA	Error rate	3.34	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-domain-document-classification-on-2)](https://paperswithcode.com/sota/cross-domain-document-classification-on-2?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-domain-document-classification-on)](https://paperswithcode.com/sota/cross-domain-document-classification-on?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-domain-document-classification-on-1)](https://paperswithcode.com/sota/cross-domain-document-classification-on-1?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-sentiment-classification-on-3)](https://paperswithcode.com/sota/cross-lingual-sentiment-classification-on-3?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-document-classification-on-8)](https://paperswithcode.com/sota/cross-lingual-document-classification-on-8?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-sentiment-classification-on-6)](https://paperswithcode.com/sota/cross-lingual-sentiment-classification-on-6?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-sentiment-classification-on-4)](https://paperswithcode.com/sota/cross-lingual-sentiment-classification-on-4?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-document-classification-on-2)](https://paperswithcode.com/sota/cross-lingual-document-classification-on-2?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-document-classification-on)](https://paperswithcode.com/sota/cross-lingual-document-classification-on?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-sentiment-classification-on-5)](https://paperswithcode.com/sota/cross-lingual-sentiment-classification-on-5?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-document-classification-on-9)](https://paperswithcode.com/sota/cross-lingual-document-classification-on-9?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-lingual-document-classification-on-1)](https://paperswithcode.com/sota/cross-lingual-document-classification-on-1?p=bridging-the-domain-gap-in-cross-lingual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-domain-gap-in-cross-lingual/cross-domain-document-classification-on-yelp)](https://paperswithcode.com/sota/cross-domain-document-classification-on-yelp?p=bridging-the-domain-gap-in-cross-lingual)`

Bridging the domain gap in cross-lingual document classification

16 Sep 2019 · Guokun Lai, Barlas Oguz, Yiming Yang, Veselin Stoyanov ·

The scarcity of labeled training data often prohibits the internationalization of NLP models to multiple languages. Recent developments in cross-lingual understanding (XLU) has made progress in this area, trying to bridge the language barrier using language universal representations. However, even if the language problem was resolved, models trained in one language would not transfer to another language perfectly due to the natural domain drift across languages and cultures. We consider the setting of semi-supervised cross-lingual understanding, where labeled data is available in a source language (English), but only unlabeled data is available in the target language. We combine state-of-the-art cross-lingual methods with recently proposed methods for weakly supervised learning such as unsupervised pre-training and unsupervised data augmentation to simultaneously close both the language gap and the domain gap in XLU. We show that addressing the domain gap is crucial. We improve over strong baselines and achieve a new state-of-the-art for cross-lingual document classification.

PDF Abstract

Code

Add Remove Mark official

laiguokun/xlu-data official

Tasks

Add Remove

Classification

Cross-Domain Document Classification

Cross-Lingual Document Classification

Cross-Lingual Sentiment Classification

Data Augmentation

Document Classification

General Classification

Unsupervised Pre-training

Weakly-supervised Learning

Datasets

XNLI Yelp MLDoc

Results from the Paper

Edit

Ranked #1 on Cross-Domain Document Classification on Amazon cn (Dianping train)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Cross-Domain Document Classification	Amazon cn (Dianping train)	XLMft UDA	Error rate	7.74	# 1	Compare
Cross-Domain Document Classification	Amazon en (Yelp train)	XLMft UDA	Error rate	7.57	# 1	Compare
Cross-Domain Document Classification	Dianping (Amazon cn train)	XLMft UDA	Error rate	4.64	# 1	Compare
Cross-Lingual Sentiment Classification	Dianping (Yelp train)	XLMft UDA	Error rate	4.64	# 1	Compare
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-Chinese	XLMft UDA	Accuracy	93.32	# 1	Compare
Cross-Lingual Sentiment Classification	MLDoc Zero-Shot English-to-Chinese	XLMft UDA	Error rate	7.74	# 1	Compare
Cross-Lingual Sentiment Classification	MLDoc Zero-Shot English-to-French	XLMft UDA	Error rate	5.95	# 1	Compare
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-French	XLMft UDA	Accuracy	96.05	# 1	Compare
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-German	XLMft UDA	Accuracy	96.95%	# 1	Compare
Cross-Lingual Sentiment Classification	MLDoc Zero-Shot English-to-German	XLMft UDA	Error rate	6.12	# 1	Compare
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-Russian	XLMft UDA	Accuracy	89.7	# 1	Compare
Cross-Lingual Document Classification	MLDoc Zero-Shot English-to-Spanish	XLMft UDA	Accuracy	96.8	# 1	Compare
Cross-Domain Document Classification	Yelp (Amazon en train)	XLMft UDA	Error rate	3.34	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Bridging the domain gap in cross-lingual document classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove