no code implementations • Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) 2022 • Shima Asaadi, Zahra Kolagar, Alina Liebel, Alessandra Zarcone
We argue for the need of benchmarks specifically created using conversational data in order to evaluate conversational LMs in the STS task.