Paper tables with annotated results for Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

Paper

Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

Reader Guidelines

Editor Guidelines