One Classifier for All Ambiguous Words: Overcoming Data Sparsity by Utilizing Sense Correlations Across Words

Most supervised word sense disambiguation (WSD) systems build word-specific classifiers by leveraging labeled data. However, when using word-specific classifiers, the sparseness of annotations leads to inferior sense disambiguation performance on less frequently seen words... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Sigmoid Activation
Activation Functions
Weight Decay
Regularization
Softmax
Output Functions
Tanh Activation
Activation Functions
Adam
Stochastic Optimization
LSTM
Recurrent Neural Networks
Multi-Head Attention
Attention Modules
Dropout
Regularization
GELU
Activation Functions
Attention Dropout
Regularization
Linear Warmup With Linear Decay
Learning Rate Schedules
Dense Connections
Feedforward Networks
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
GloVe
Word Embeddings
WordPiece
Subword Segmentation
Residual Connection
Skip Connections
BERT
Language Models
BiLSTM
Bidirectional Recurrent Neural Networks
ELMo
Word Embeddings