Word Embeddings

970 papers with code • 0 benchmarks • 49 datasets

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.

Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.

( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )


Use these libraries to find Word Embeddings models and implementations

Most implemented papers

Enriching Word Vectors with Subword Information

facebookresearch/fastText TACL 2017

A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

FastText.zip: Compressing text classification models

facebookresearch/fastText 12 Dec 2016

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

facebookresearch/InferSent EMNLP 2017

Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features.

Universal Sentence Encoder

facebookresearch/InferSent 29 Mar 2018

For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance.

Word Translation Without Parallel Data

facebookresearch/MUSE ICLR 2018

We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation.

Evaluation of sentence embeddings in downstream and linguistic probing tasks

allenai/bilm-tf 16 Jun 2018

Despite the fast developmental pace of new sentence embedding methods, it is still challenging to find comprehensive evaluations of these different techniques.

Named Entity Recognition with Bidirectional LSTM-CNNs

zalandoresearch/flair TACL 2016

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance.

Topic Modeling in Embedding Spaces

adjidieng/ETM TACL 2020

To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings.

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

zihangdai/mos ICLR 2018

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck.

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

uzairakbar/info-retrieval NeurIPS 2016

Geometrically, gender bias is first shown to be captured by a direction in the word embedding.