Word Embeddings

1106 papers with code • 0 benchmarks • 52 datasets

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.

Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.

( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )

Benchmarks

Add a Result

These leaderboards are used to track progress in Word Embeddings

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Enriching Word Vectors with Subword Information

facebookresearch/fastText • TACL 2017

A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

Paper
Code

FastText.zip: Compressing text classification models

facebookresearch/fastText • 12 Dec 2016

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

Paper
Code

Universal Sentence Encoder

facebookresearch/InferSent • • 29 Mar 2018

For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.

Paper
Code

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

facebookresearch/InferSent • • EMNLP 2017

Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features.

Paper
Code

Word Translation Without Parallel Data

facebookresearch/MUSE • • ICLR 2018

We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation.

Paper
Code

Named Entity Recognition with Bidirectional LSTM-CNNs

flairNLP/flair • • TACL 2016

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance.

Paper
Code

Evaluation of sentence embeddings in downstream and linguistic probing tasks

allenai/bilm-tf • • 16 Jun 2018

Despite the fast developmental pace of new sentence embedding methods, it is still challenging to find comprehensive evaluations of these different techniques.

Paper
Code

Topic Modeling in Embedding Spaces

adjidieng/ETM • • TACL 2020

To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings.

Paper
Code

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

zihangdai/mos • • ICLR 2018

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck.

Paper
Code

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

uzairakbar/info-retrieval • NeurIPS 2016

Geometrically, gender bias is first shown to be captured by a direction in the word embedding.

Paper
Code

Word Embeddings

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result