Word Embeddings
1108 papers with code • 0 benchmarks • 52 datasets
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.
Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification.
( Image credit: Dynamic Word Embedding for Evolving Semantic Discovery )
Benchmarks
These leaderboards are used to track progress in Word Embeddings
Datasets
Subtasks
Latest papers
ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting
We also demonstrate the effectiveness of ProMap in re-ranking results from other BLI methods such as with aligned static word embeddings.
MLFMF: Data Sets for Machine Learning for Mathematical Formalization
The collection includes the largest Lean~4 library Mathlib, and some of the largest Agda libraries: the standard library, the library of univalent mathematics Agda-unimath, and the TypeTopology library.
GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings
Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces.
ChatGPT-guided Semantics for Zero-shot Learning
Then, we enrich word vectors by combining the word embeddings from class names and descriptions generated by ChatGPT.
$\textit{Swap and Predict}$ -- Predicting the Semantic Changes in Words across Corpora by Context Swapping
Intuitively, if the meaning of $w$ does not change between $\mathcal{C}_1$ and $\mathcal{C}_2$, we would expect the distributions of contextualised word embeddings of $w$ to remain the same before and after this random swapping process.
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling
We describe an end-to-end speech synthesis system that uses generative adversarial training.
Lightweight Adaptation of Neural Language Models via Subspace Embedding
Traditional neural word embeddings are usually dependent on a richer diversity of vocabulary.
3D-EX : A Unified Dataset of Definitions and Dictionary Examples
Definitions are a fundamental building block in lexicography, linguistics and computational semantics.
Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public.
Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers
Finally, we discuss how this approach can be further exploited in terms of explainability and adversarial robustness.