Towards Neural Similarity Evaluator

We review three limitations of BLEU and ROUGE – the most popular metrics used to assess reference summaries against hypothesis summaries, come up with criteria for what a good metric should behave like and propose concrete ways to assess the performance of a metric in detail and show the potential of Transformers-based Language Models to assess reference summaries against hypothesis summaries.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here