Word Embeddings

258 papers with code · Methodology

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Adversarial Training Methods for Semi-Supervised Text Classification

25 May 2016tensorflow/models

Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making small perturbations to numerous entries of the input vector, which is inappropriate for sparse high-dimensional inputs such as one-hot word representations.

SENTIMENT ANALYSIS TEXT CLASSIFICATION WORD EMBEDDINGS

FastText.zip: Compressing text classification models

12 Dec 2016facebookresearch/fastText

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method built upon product quantization to store word embeddings.

TEXT CLASSIFICATION WORD EMBEDDINGS

Semi-supervised sequence tagging with bidirectional language models

ACL 2017 zalandoresearch/flair

Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data.

CHUNKING NAMED ENTITY RECOGNITION WORD EMBEDDINGS

Named Entity Recognition with Bidirectional LSTM-CNNs

TACL 2016 zalandoresearch/flair

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance. In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering.

ENTITY LINKING NAMED ENTITY RECOGNITION WORD EMBEDDINGS

StarSpace: Embed All The Things!

12 Sep 2017facebookresearch/ParlAI

We present StarSpace, a general-purpose neural embedding model that can solve a wide variety of problems: labeling tasks such as text classification, ranking tasks such as information retrieval/web search, collaborative filtering-based or content-based recommendation, embedding of multi-relational graphs, and learning word, sentence or document level embeddings. In each case the model works by embedding those entities comprised of discrete features and comparing them against each other -- learning similarities dependent on the task.

COLLABORATIVE FILTERING TEXT CLASSIFICATION WORD EMBEDDINGS

Analogical Reasoning on Chinese Morphological and Semantic Relations

ACL 2018 Embedding/Chinese-Word-Vectors

Analogical reasoning is effective in capturing linguistic regularities. This paper proposes an analogical reasoning task on Chinese.

WORD EMBEDDINGS

Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data

4 Apr 2018beamandrew/medical-data

Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing. In this article, we present a new set of embeddings for medical concepts learned using an extremely large collection of multimodal medical data.

WORD EMBEDDINGS

Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition

27 Sep 2017deepmipt/DeepPavlov

Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others.

NAMED ENTITY RECOGNITION WORD EMBEDDINGS

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

6 May 2016cemoody/lda2vec

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe lda2vec, a model that learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors.

TOPIC MODELS WORD EMBEDDINGS

Unsupervised Alignment of Embeddings with Wasserstein Procrustes

29 May 2018facebookresearch/MUSE

We consider the task of aligning two sets of points in high dimension, which has many applications in natural language processing and computer vision. Finally, we evaluate our method on the problem of unsupervised word translation, by aligning word embeddings trained on monolingual data.

WORD EMBEDDINGS