Document Embedding

23 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

ARAGOG: Advanced RAG Output Grading

predlico/aragog 1 Apr 2024

Sentence Window Retrieval emerged as the most effective for retrieval precision, despite its variable performance on answer similarity.

72
01 Apr 2024

HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification

rooooyy/hill 26 Mar 2024

Existing self-supervised methods in natural language processing (NLP), especially hierarchical text classification (HTC), mainly focus on self-supervised contrastive learning, extremely relying on human-designed augmentation rules to generate contrastive samples, which can potentially corrupt or distort the original information.

1
26 Mar 2024

Approach to Predicting News -- A Precise Multi-LSTM Network With BERT

LanaChen0/Predict_News 26 Apr 2022

According to the V-Dem annual democracy report 2019, Taiwan is one of the two countries that got disseminated false information from foreign governments the most.

2
26 Apr 2022

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

MaartenGr/BERTopic 11 Mar 2022

BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

5,604
11 Mar 2022

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

malteos/scincl 14 Feb 2022

Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics.

59
14 Feb 2022

CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking

gzerveas/CODER 16 Dec 2021

Contrastive learning has been the dominant approach to training dense retrieval models.

5
16 Dec 2021

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

linhanz/mderank Findings (ACL) 2022

In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document.

60
13 Oct 2021

Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context

xnliang98/uke_ccrank EMNLP 2021

In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices.

40
15 Sep 2021

Multifaceted Domain-Specific Document Embeddings

philipphager/faceted-domain-encoder NAACL 2021

Current document embeddings require large training corpora but fail to learn high-quality representations when confronted with a small number of domain-specific documents and rare terms.

3
01 Jun 2021

Unsupervised Document Embedding via Contrastive Augmentation

knowlab/bi-weekly-paper-presentation 26 Mar 2021

We present a contrasting learning approach with data augmentation techniques to learn document representations in an unsupervised manner.

8
26 Mar 2021