Word Embeddings
1096 papers with code • 0 benchmarks • 52 datasets
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.
Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.
( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )
Benchmarks
These leaderboards are used to track progress in Word Embeddings
Datasets
Subtasks
Latest papers
SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks
Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern.
Prescribing Large Language Models for Perioperative Care: What's The Right Dose for Pre-trained Models?
Adapting models further improved performance: (1) self-supervised finetuning by 3. 2% for AUROC and 1. 5% for AUPRC; (2) semi-supervised finetuning by 1. 8% for AUROC and 2% for AUPRC, compared to self-supervised finetuning; (3) foundational modelling by 3. 6% for AUROC and 2. 6% for AUPRC, compared to self-supervised finetuning.
A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change
Our evaluation is performed across different languages on eight available benchmarks for LSC, and shows that (i) APD outperforms other approaches for GCD; (ii) XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD, while being comparable to GPT-4; (iii) there is a clear need for improving the modeling of word meanings, as well as focus on how, when, and why these meanings change, rather than solely focusing on the extent of semantic change.
Semi-Supervised Learning for Bilingual Lexicon Induction
It was recently shown that it is possible to infer such lexicon, without using any parallel data, by aligning word embeddings trained on monolingual data.
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset, but easily overfit on datasets of insufficient complexity.
Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene Classification
Besides, pioneer ZSL models use convolutional neural networks pre-trained on ImageNet, which focus on the main objects appearing in each image, neglecting the background context that also matters in RS scene classification.
Graph-based Clustering for Detecting Semantic Change Across Time and Languages
To address this issue, we propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages, including the acquisition and loss of these senses over time.
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs
We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper.
Pre-training and Diagnosing Knowledge Base Completion Models
The method works for both canonicalized knowledge bases and uncanonicalized or open knowledge bases, i. e., knowledge bases where more than one copy of a real-world entity or relation may exist.
Contrastive Learning in Distilled Models
Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks.