BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

Paper
Code

Sentiment Classification Using Document Embeddings Trained with Cosine Similarity

tanthongtan/dv-cosine • ACL 2019

In document-level sentiment classification, each document must be mapped to a fixed length vector.

Paper
Code

Neural Document Embeddings for Intensive Care Patient Mortality Prediction

cmasch/cnn-text-classification • • 1 Dec 2016

We present an automatic mortality prediction scheme based on the unstructured textual content of clinical notes.

Paper
Code

hyperdoc2vec: Distributed Representations of Hypertext Documents

HelloRusk/hyperdoc2vec • ACL 2018

Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life.

Paper
Code

Word Mover's Embedding: From Word2Vec to Document Embedding

IBM/WordMoversEmbeddings • EMNLP 2018

While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings.

Paper
Code

Learning Outside the Box: Discourse-level Features Improve Metaphor Identification

jayelm/broader-metaphor • NAACL 2019

Most current approaches to metaphor identification use restricted linguistic contexts, e. g. by considering only a verb's arguments or the sentence containing a phrase.

Paper
Code

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

epfl-dlab/Cr5 • 8 Apr 2019

Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.

Paper
Code

Unsupervised Learning of Discourse-Aware Text Representation for Essay Scoring

FarjanaSultanaMim/DiscoShuffle • • ACL 2019

Existing document embedding approaches mainly focus on capturing sequences of words in documents.

Paper
Code

Document Embedding

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result