no code implementations • 9 Oct 2024 • Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh
Our factor level analysis reveals a substantial discrepancy between human and LLM preferences in generation tasks, whereas LLMs show strong alignment with human preferences in evaluation tasks.
1 code implementation • 28 Feb 2024 • Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh
Evaluating the factuality of long-form large language model (LLM)-generated text is an important challenge.
no code implementations • 9 Feb 2024 • Juhyun Oh, Eunsu Kim, Inha Cha, Alice Oh
This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators.
1 code implementation • 23 May 2022 • Younghoon Jeong, Juhyun Oh, Jaimeen Ahn, Jongwon Lee, Jihyung Moon, Sungjoon Park, Alice Oh
Recent directions for offensive language detection are hierarchical modeling, identifying the type and the target of offensive language, and interpretability with offensive span annotation and prediction.
4 code implementations • 20 May 2021 • Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, JunSeong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, InKwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung-Woo Ha, Kyunghyun Cho
We introduce Korean Language Understanding Evaluation (KLUE) benchmark.