no code implementations • NAACL (GeBNLP) 2022 • Jaimeen Ahn, Hwaran Lee, JinHwa Kim, Alice Oh
Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model. However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model. This paper studies what causes gender bias to increase after the knowledge distillation process. Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation. By doing so, we can significantly reduce the gender bias amplification after knowledge distillation. We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.
no code implementations • EMNLP 2020 • Sungjoon Park, Kiwoong Park, Jaimeen Ahn, Alice Oh
We analyze social media for detecting the suicidal risk of military personnel, which is especially crucial for countries with compulsory military service such as the Republic of Korea.
1 code implementation • COLING 2022 • Yeon Seonwoo, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Alice Oh
We conduct three experiments 1) domain-specific document retrieval, 2) comparison of our virtual knowledge graph construction method with previous approaches, and 3) ablation study on each component of our virtual knowledge graph.
no code implementations • 8 Jul 2024 • Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim
Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e. g., lack of inhibitory control).
1 code implementation • 14 Jun 2024 • Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh
To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages.
1 code implementation • 16 Mar 2024 • Sheikh Shafayat, H M Quamran Hasan, Minhajur Rahman Chowdhury Mahim, Rifki Afina Putri, James Thorne, Alice Oh
In this study, we introduce BEnQA, a dataset comprising parallel Bengali and English exam questions for middle and high school levels in Bangladesh.
no code implementations • 13 Mar 2024 • Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh
RECIPE4U includes comprehensive records of these interactions, including conversation logs, students' intent, students' self-rated satisfaction, and students' essay edit histories.
1 code implementation • 11 Mar 2024 • Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, Alice Oh
Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge.
1 code implementation • 28 Feb 2024 • Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh
Large Language Models (LLMs) are prone to factuality hallucination, generating text that contradicts established knowledge.
1 code implementation • 27 Feb 2024 • Rifki Afina Putri, Faiz Ghifari Haznitrama, Dea Adhista, Alice Oh
Large Language Models (LLMs) are increasingly being used to generate synthetic data for training and evaluating models.
no code implementations • 9 Feb 2024 • Juhyun Oh, Eunsu Kim, Inha Cha, Alice Oh
This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators.
1 code implementation • 19 Oct 2023 • Jungbin Son, Alice Oh
The model is trained to extract the answer span from the sentence that is both correct in time and context.
no code implementations • 8 Oct 2023 • Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Hyunseung Lim, Yoonsu Kim, Tak Yeon Lee, Hwajung Hong, Juho Kim, So-Yeon Ahn, Alice Oh
In the context of English as a Foreign Language (EFL) writing education, LLM-as-a-tutor can assist students by providing real-time feedback on their essays.
no code implementations • 23 Sep 2023 • Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh
We analyze students' usage patterns and perceptions regarding generative AI with respect to their intent and satisfaction.
no code implementations • 19 Sep 2023 • Changyoon Lee, Junho Myung, Jieun Han, Jiho Jin, Alice Oh
To compare the learners' interaction and perception of the AI and human TAs, we conducted a between-subject study with 20 novice programming learners.
1 code implementation • 31 Aug 2023 • Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Jose Camacho-Collados, Juho Kim, Alice Oh
To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset.
no code implementations • 31 Jul 2023 • Jiho Jin, Jiseon Kim, Nayeon Lee, Haneul Yoo, Alice Oh, Hwaran Lee
In this paper, we present KoBBQ, a Korean bias benchmark dataset, and we propose a general framework that addresses considerations for cultural adaptation of a dataset.
1 code implementation • 28 May 2023 • Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Yejin Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim, Alice Oh, Sangchul Park, Jung-Woo Ha
The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising.
1 code implementation • NAACL 2022 • Changyoon Lee, Yeon Seonwoo, Alice Oh
We introduce CS1QA, a dataset for code-based question answering in the programming education domain.
1 code implementation • 25 Oct 2022 • Rifki Afina Putri, Alice Oh
Machine Reading Comprehension (MRC) has become one of the essential tasks in Natural Language Understanding (NLU) as it is often included in several NLU benchmarks (Liang et al., 2020; Wilie et al., 2020).
Machine Reading Comprehension Natural Language Understanding +1
1 code implementation • 25 Oct 2022 • Soyoung Yoon, Sungjoon Park, Gyuwan Kim, Junhee Cho, Kihyo Park, Gyutae Kim, Minjoon Seo, Alice Oh
We show that the model trained with our datasets significantly outperforms the currently used statistical Korean GEC system (Hanspell) on a wider range of error types, demonstrating the diversity and usefulness of the datasets.
no code implementations • 13 Oct 2022 • Haneul Yoo, Rifki Afina Putri, Changyoon Lee, Youngin Lee, So-Yeon Ahn, Dongyeop Kang, Alice Oh
Researchers have traditionally recruited native speakers to provide annotations for widely used benchmark datasets.
1 code implementation • Findings (NAACL) 2022 • Haneul Yoo, Jiho Jin, Juhee Son, JinYeong Bak, Kyunghyun Cho, Alice Oh
Historical records in Korea before the 20th century were primarily written in Hanja, an extinct language based on Chinese characters and not understood by modern Korean or Chinese speakers.
1 code implementation • 9 Sep 2022 • Yeon Seonwoo, Guoyin Wang, Changmin Seo, Sajal Choudhary, Jiwei Li, Xiang Li, Puyang Xu, Sunghyun Park, Alice Oh
In this work, we show that the semantic meaning of a sentence is also determined by nearest-neighbor sentences that are similar to the input sentence.
1 code implementation • 1 Sep 2022 • Dongkwan Kim, Jiho Jin, Jaimeen Ahn, Alice Oh
Subgraphs are rich substructures in graphs, and their nodes and edges can be partially observed in real-world tasks.
1 code implementation • 23 May 2022 • Younghoon Jeong, Juhyun Oh, Jaimeen Ahn, Jongwon Lee, Jihyung Moon, Sungjoon Park, Alice Oh
Recent directions for offensive language detection are hierarchical modeling, identifying the type and the target of offensive language, and interpretability with offensive span annotation and prediction.
no code implementations • 20 May 2022 • Juhee Son, Jiho Jin, Haneul Yoo, JinYeong Bak, Kyunghyun Cho, Alice Oh
Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English.
1 code implementation • Findings (ACL) 2022 • Yeon Seonwoo, Juhee Son, Jiho Jin, Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh
These models have shown a significant increase in inference speed, but at the cost of lower QA performance compared to the retriever-reader models.
2 code implementations • ICLR 2021 • Dongkwan Kim, Alice Oh
However, what graph attention learns is not understood well, particularly when graphs are noisy.
1 code implementation • 9 Apr 2022 • Dongkwan Kim, Alice Oh
Subgraph representation learning has emerged as an important problem, but it is by default approached with specialized graph neural networks on a large global graph.
no code implementations • NeurIPS 2021 • Jooyeon Kim, Alice Oh
Just as we humans have succeeded in creating a shared language that allows us to interact within a large group, can the emergent communication within an artificial group converge to a shared, agreed language?
no code implementations • 29 Sep 2021 • Dongkwan Kim, Jiho Jin, Jaimeen Ahn, Alice Oh
Subgraphs are important substructures of graphs, but learning their representations has not been studied well.
1 code implementation • Findings (EMNLP) 2021 • Yohan Jo, Haneul Yoo, JinYeong Bak, Alice Oh, Chris Reed, Eduard Hovy
Finding counterevidence to statements is key to many tasks, including counterargument generation.
1 code implementation • EMNLP 2021 • Jiseon Kim, Elden Griggs, In Song Kim, Alice Oh
Despite the significance of bill-to-bill linkages for understanding the legislative process, existing approaches fail to address semantic similarities across bills, let alone reordering or paraphrasing which are prevalent in legal document writing.
1 code implementation • EMNLP 2021 • Jaimeen Ahn, Alice Oh
Which of the two methods works better depends on the amount of NLP resources available for that language.
1 code implementation • EMNLP 2021 • Seonghyeon Ye, Jiseon Kim, Alice Oh
We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.
1 code implementation • Findings (ACL) 2021 • Yeon Seonwoo, Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh
In multi-hop QA, answering complex questions entails iterative document retrieval for finding the missing entity of the question.
3 code implementations • 20 May 2021 • Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, JunSeong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, InKwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung-Woo Ha, Kyunghyun Cho
We introduce Korean Language Understanding Evaluation (KLUE) benchmark.
1 code implementation • EMNLP 2020 • Yeon Seonwoo, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh
With experiments on reading comprehension, we show that BLANC outperforms the state-of-the-art QA models, and the performance gap increases as the number of answer text occurrences increases.
1 code implementation • ACL 2020 • JinYeong Bak, Alice Oh
We provide our code and the learned parameters so that they can be used for automatic evaluation of dialogue response generation models.
1 code implementation • 8 May 2020 • Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, Uichin Lee
Therefore, studying emotions in the context of social interactions requires a novel dataset, and K-EmoCon is such a multimodal dataset with comprehensive annotations of continuous emotions during naturalistic conversations.
1 code implementation • EMNLP 2021 • Sungjoon Park, Jiseon Kim, Seonghyeon Ye, Jaeyeol Jeon, Hee Young Park, Alice Oh
We present a model to predict fine-grained emotions along the continuous dimensions of valence, arousal, and dominance (VAD) with a corpus with categorical emotion annotations.
1 code implementation • IJCNLP 2019 • JinYeong Bak, Alice Oh
To overcome this limitation, we propose a new model with a stochastic variable designed to capture the speaker information and deliver it to the conversational context.
no code implementations • WS 2019 • Yeon Seonwoo, Sungjoon Park, Dongkwan Kim, Alice Oh
Additive compositionality of word embedding models has been studied from empirical and theoretical perspectives.
no code implementations • 25 Sep 2019 • Jooyeon Kim, Alice Oh
We consider a setting where biases are involved when agents internalise an environment.
no code implementations • NAACL 2019 • Sungjoon Park, Donghyun Kim, Alice Oh
A dataset of those interactions can be used to learn to automatically classify the client utterances into categories that help counselors in diagnosing client status and predicting counseling outcome.
1 code implementation • 16 Nov 2018 • Jooyeon Kim, Dongkwan Kim, Alice Oh
An overwhelming number of true and false news stories are posted and shared in social networks, and users diffuse the stories based on multiple factors.
no code implementations • EMNLP 2018 • Yeon Seonwoo, Alice Oh, Sungjoon Park
In news and discussions, many articles and posts are provided without their related previous articles or posts.
no code implementations • EMNLP 2018 • JinYeong Bak, Alice Oh
Styles of leaders when they make decisions in groups vary, and the different styles affect the performance of the group.
1 code implementation • ACL 2018 • Sungjoon Park, Jeongmin Byun, Sion Baek, Yongseok Cho, Alice Oh
The results show that our simple method outperforms word2vec and character-level Skip-Grams on semantic and syntactic similarity and analogy tasks and contributes positively toward downstream NLP tasks such as sentiment analysis.
1 code implementation • 27 Nov 2017 • Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schoelkopf, Manuel Gomez-Rodriguez
Online social networking sites are experimenting with the following crowd-powered procedure to reduce the spread of fake news and misinformation: whenever a user is exposed to a story through her feed, she can flag the story as misinformation and, if the story receives enough flags, it is sent to a trusted third party for fact checking.
1 code implementation • EMNLP 2017 • Sungjoon Park, JinYeong Bak, Alice Oh
We apply several rotation algorithms to the vector representation of words to improve the interpretability.
no code implementations • TACL 2017 • Jooyeon Kim, Dongwoo Kim, Alice Oh
Second, it models each author's influence on citations of a paper based on the topics of the cited papers, as well as the citing papers.
no code implementations • 28 Aug 2015 • Suin Kim, Sungjoon Park, Scott A. Hale, Sooyoung Kim, Jeongmin Byun, Alice Oh
We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia.
no code implementations • 22 Mar 2014 • Dongwoo Kim, Alice Oh
We present the \textit{hierarchical Dirichlet scaling process} (HDSP), a Bayesian nonparametric mixed membership model.