20 papers with code • 0 benchmarks • 2 datasets
These leaderboards are used to track progress in Document Embedding
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.
We first build individual graphs for each document and then use GNN to learn the fine-grained word representations based on their local structures, which can also effectively produce embeddings for unseen words in the new document.
BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
We present an automatic mortality prediction scheme based on the unstructured textual content of clinical notes.
Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life.
While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings.
Most current approaches to metaphor identification use restricted linguistic contexts, e. g. by considering only a verb's arguments or the sentence containing a phrase.
Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.