Word Embeddings
1096 papers with code • 0 benchmarks • 52 datasets
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.
Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.
( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )
Benchmarks
These leaderboards are used to track progress in Word Embeddings
Datasets
Subtasks
Latest papers
Def2Vec: Extensible Word Embeddings from Dictionary Definitions
Def2Vec introduces a novel paradigm for word embeddings, leveraging dictionary definitions to learn semantic representations.
Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Distributed representations provide a vector space that captures meaningful relationships between data instances.
Quantifying the redundancy between prosody and text
Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings.
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining.
Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method
This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm.
How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure
We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts that were seen during pre-training (e. g., the active object and passive subject of the verb spray), succeeding by making use of the semantically-organized structure of the embedding space for word embeddings.
An Embedded Diachronic Sense Change Model with a Case Study from Ancient Greek
These models represent the senses of a given target word such as "kosmos" (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses.
ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting
We also demonstrate the effectiveness of ProMap in re-ranking results from other BLI methods such as with aligned static word embeddings.
GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings
Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces.
ChatGPT-guided Semantics for Zero-shot Learning
Then, we enrich word vectors by combining the word embeddings from class names and descriptions generated by ChatGPT.