TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (Crawl)	Wasserstein Distance (WD)	84.2 ± .5	# 12
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (Crawl)	# Correct Groups	80 ± 4	# 13
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (Crawl)	Fowlkes Mallows Score (FMS)	32.1 ± .3	# 13
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (Crawl)	Adjusted Rand Index (ARI)	15.2 ± .3	# 13
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (Crawl)	Adjusted Mutual Information (AMI)	18.4 ± .4	# 13
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (Crawl)	# Solved Walls	0 ± 0	# 10
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (News)	Wasserstein Distance (WD)	85.5 ± .5	# 15
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (News)	# Correct Groups	62 ± 3	# 16
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (News)	Fowlkes Mallows Score (FMS)	30.4 ± .2	# 15
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (News)	Adjusted Rand Index (ARI)	13.0 ± .2	# 15
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (News)	Adjusted Mutual Information (AMI)	15.8 ± .3	# 15
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (News)	# Solved Walls	0 ± 0	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-word-vectors-for-157-languages/task-1-grouping-on-ocw)](https://paperswithcode.com/sota/task-1-grouping-on-ocw?p=learning-word-vectors-for-157-languages)`

Learning Word Vectors for 157 Languages

LREC 2018 · Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov ·

Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is to train them on very large corpora, and use these pre-trained models in downstream tasks. In this paper, we describe how we trained such high quality word representations for 157 languages. We used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project. We also introduce three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish. Finally, we evaluate our pre-trained word vectors on 10 languages for which evaluation datasets exists, showing very strong performance compared to previous models.

PDF Abstract LREC 2018 PDF LREC 2018 Abstract

Code

Add Remove Mark official

dzieciou/lemmatizer-pl

KMicha/MachineLearning

Tasks

Add Remove

Only Connect Walls Dataset Task 1 (Grouping)

Datasets

OCW

Results from the Paper

Edit

Ranked #12 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (Crawl)	Wasserstein Distance (WD)	84.2 ± .5	# 12	Compare
			# Correct Groups	80 ± 4	# 13	Compare
			Fowlkes Mallows Score (FMS)	32.1 ± .3	# 13	Compare
			Adjusted Rand Index (ARI)	15.2 ± .3	# 13	Compare
			Adjusted Mutual Information (AMI)	18.4 ± .4	# 13	Compare
			# Solved Walls	0 ± 0	# 10	Compare
Only Connect Walls Dataset Task 1 (Grouping)	OCW	FastText (News)	Wasserstein Distance (WD)	85.5 ± .5	# 15	Compare
			# Correct Groups	62 ± 3	# 16	Compare
			Fowlkes Mallows Score (FMS)	30.4 ± .2	# 15	Compare
			Adjusted Rand Index (ARI)	13.0 ± .2	# 15	Compare
			Adjusted Mutual Information (AMI)	15.8 ± .3	# 15	Compare
			# Solved Walls	0 ± 0	# 10	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Learning Word Vectors for 157 Languages

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove