1 code implementation • 6 Apr 2024 • Yann Dubois, Balázs Galambosi, Percy Liang, Tatsunori B. Hashimoto
Even simple, known confounders such as preference for longer outputs remain in existing automated evaluation metrics.
Chatbot counterfactual