1 code implementation • 14 Jun 2024 • Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh
To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages.
1 code implementation • 11 Mar 2024 • Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, Alice Oh
Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge.
1 code implementation • 28 Feb 2024 • Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh
Large Language Models (LLMs) are prone to factuality hallucination, generating text that contradicts established knowledge.
no code implementations • 9 Feb 2024 • Juhyun Oh, Eunsu Kim, Inha Cha, Alice Oh
This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators.