Document Embedding

20 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Document Embedding with Paragraph Vectors

inejc/paragraph-vectors 29 Jul 2015

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts.

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

jhlau/doc2vec WS 2016

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.

Sentiment Classification Using Document Embeddings Trained with Cosine Similarity

tanthongtan/dv-cosine ACL 2019

In document-level sentiment classification, each document must be mapped to a fixed length vector.

Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks


We first build individual graphs for each document and then use GNN to learn the fine-grained word representations based on their local structures, which can also effectively produce embeddings for unseen words in the new document.

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

MaartenGr/BERTopic 11 Mar 2022

BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

Neural Document Embeddings for Intensive Care Patient Mortality Prediction

cmasch/cnn-text-classification 1 Dec 2016

We present an automatic mortality prediction scheme based on the unstructured textual content of clinical notes.

hyperdoc2vec: Distributed Representations of Hypertext Documents

HelloRusk/hyperdoc2vec ACL 2018

Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life.

Word Mover's Embedding: From Word2Vec to Document Embedding

IBM/WordMoversEmbeddings EMNLP 2018

While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings.

Learning Outside the Box: Discourse-level Features Improve Metaphor Identification

jayelm/broader-metaphor NAACL 2019

Most current approaches to metaphor identification use restricted linguistic contexts, e. g. by considering only a verb's arguments or the sentence containing a phrase.

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

epfl-dlab/Cr5 8 Apr 2019

Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.