Document Embedding
23 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Document Embedding
Latest papers
ARAGOG: Advanced RAG Output Grading
Sentence Window Retrieval emerged as the most effective for retrieval precision, despite its variable performance on answer similarity.
HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification
Existing self-supervised methods in natural language processing (NLP), especially hierarchical text classification (HTC), mainly focus on self-supervised contrastive learning, extremely relying on human-designed augmentation rules to generate contrastive samples, which can potentially corrupt or distort the original information.
Approach to Predicting News -- A Precise Multi-LSTM Network With BERT
According to the V-Dem annual democracy report 2019, Taiwan is one of the two countries that got disseminated false information from foreign governments the most.
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings
Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics.
CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking
Contrastive learning has been the dominant approach to training dense retrieval models.
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document.
Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices.
Multifaceted Domain-Specific Document Embeddings
Current document embeddings require large training corpora but fail to learn high-quality representations when confronted with a small number of domain-specific documents and rare terms.
Unsupervised Document Embedding via Contrastive Augmentation
We present a contrasting learning approach with data augmentation techniques to learn document representations in an unsupervised manner.