Search Results for author: Alice Oh

Found 59 papers, 34 papers with code

Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT

no code implementations NAACL (GeBNLP) 2022 Jaimeen Ahn, Hwaran Lee, JinHwa Kim, Alice Oh

Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model. However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model. This paper studies what causes gender bias to increase after the knowledge distillation process. Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation. By doing so, we can significantly reduce the gender bias amplification after knowledge distillation. We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.

Knowledge Distillation

Suicidal Risk Detection for Military Personnel

no code implementations EMNLP 2020 Sungjoon Park, Kiwoong Park, Jaimeen Ahn, Alice Oh

We analyze social media for detecting the suicidal risk of military personnel, which is especially crucial for countries with compulsory military service such as the Republic of Korea.

Ethics

Virtual Knowledge Graph Construction for Zero-Shot Domain-Specific Document Retrieval

1 code implementation COLING 2022 Yeon Seonwoo, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Alice Oh

We conduct three experiments 1) domain-specific document retrieval, 2) comparison of our virtual knowledge graph construction method with previous approaches, and 3) ablation study on each component of our virtual knowledge graph.

Domain Adaptation graph construction +2

Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models

no code implementations8 Jul 2024 Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim

Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e. g., lack of inhibitory control).

BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English

1 code implementation16 Mar 2024 Sheikh Shafayat, H M Quamran Hasan, Minhajur Rahman Chowdhury Mahim, Rifki Afina Putri, James Thorne, Alice Oh

In this study, we introduce BEnQA, a dataset comprising parallel Bengali and English exam questions for middle and high school levels in Bangladesh.

Question Answering

RECIPE4U: Student-ChatGPT Interaction Dataset in EFL Writing Education

no code implementations13 Mar 2024 Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh

RECIPE4U includes comprehensive records of these interactions, including conversation logs, students' intent, students' self-rated satisfaction, and students' essay edit histories.

Intent Detection Task-Oriented Dialogue Systems

CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

1 code implementation11 Mar 2024 Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, Alice Oh

Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge.

Hate Speech Detection

Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore

1 code implementation28 Feb 2024 Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh

Large Language Models (LLMs) are prone to factuality hallucination, generating text that contradicts established knowledge.

Hallucination

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

no code implementations9 Feb 2024 Juhyun Oh, Eunsu Kim, Inha Cha, Alice Oh

This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators.

Question Answering TriviaQA

Time-Aware Representation Learning for Time-Sensitive Question Answering

1 code implementation19 Oct 2023 Jungbin Son, Alice Oh

The model is trained to extract the answer span from the sentence that is both correct in time and context.

Question Answering Representation Learning +1

LLM-as-a-tutor in EFL Writing Education: Focusing on Evaluation of Student-LLM Interaction

no code implementations8 Oct 2023 Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Hyunseung Lim, Yoonsu Kim, Tak Yeon Lee, Hwajung Hong, Juho Kim, So-Yeon Ahn, Alice Oh

In the context of English as a Foreign Language (EFL) writing education, LLM-as-a-tutor can assist students by providing real-time feedback on their essays.

Automated Essay Scoring

ChEDDAR: Student-ChatGPT Dialogue in EFL Writing Education

no code implementations23 Sep 2023 Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh

We analyze students' usage patterns and perceptions regarding generative AI with respect to their intent and satisfaction.

Intent Detection Task-Oriented Dialogue Systems

Learning from Teaching Assistants to Program with Subgoals: Exploring the Potential for AI Teaching Assistants

no code implementations19 Sep 2023 Changyoon Lee, Junho Myung, Jieun Han, Jiho Jin, Alice Oh

To compare the learners' interaction and perception of the AI and human TAs, we conducted a between-subject study with 20 novice programming learners.

KoBBQ: Korean Bias Benchmark for Question Answering

no code implementations31 Jul 2023 Jiho Jin, Jiseon Kim, Nayeon Lee, Haneul Yoo, Alice Oh, Hwaran Lee

In this paper, we present KoBBQ, a Korean bias benchmark dataset, and we propose a general framework that addresses considerations for cultural adaptation of a dataset.

Question Answering

IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension

1 code implementation25 Oct 2022 Rifki Afina Putri, Alice Oh

Machine Reading Comprehension (MRC) has become one of the essential tasks in Natural Language Understanding (NLU) as it is often included in several NLU benchmarks (Liang et al., 2020; Wilie et al., 2020).

Machine Reading Comprehension Natural Language Understanding +1

Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation

1 code implementation25 Oct 2022 Soyoung Yoon, Sungjoon Park, Gyuwan Kim, Junhee Cho, Kihyo Park, Gyutae Kim, Minjoon Seo, Alice Oh

We show that the model trained with our datasets significantly outperforms the currently used statistical Korean GEC system (Hanspell) on a wider range of error types, demonstrating the diversity and usefulness of the datasets.

Attribute Diversity +1

HUE: Pretrained Model and Dataset for Understanding Hanja Documents of Ancient Korea

1 code implementation Findings (NAACL) 2022 Haneul Yoo, Jiho Jin, Juhee Son, JinYeong Bak, Kyunghyun Cho, Alice Oh

Historical records in Korea before the 20th century were primarily written in Hanja, an extinct language based on Chinese characters and not understood by modern Korean or Chinese speakers.

named-entity-recognition Named Entity Recognition +3

Ranking-Enhanced Unsupervised Sentence Representation Learning

1 code implementation9 Sep 2022 Yeon Seonwoo, Guoyin Wang, Changmin Seo, Sajal Choudhary, Jiwei Li, Xiang Li, Puyang Xu, Sunghyun Park, Alice Oh

In this work, we show that the semantic meaning of a sentence is also determined by nearest-neighbor sentences that are similar to the input sentence.

Contrastive Learning Data Augmentation +5

Models and Benchmarks for Representation Learning of Partially Observed Subgraphs

1 code implementation1 Sep 2022 Dongkwan Kim, Jiho Jin, Jaimeen Ahn, Alice Oh

Subgraphs are rich substructures in graphs, and their nodes and edges can be partially observed in real-world tasks.

Representation Learning

KOLD: Korean Offensive Language Dataset

1 code implementation23 May 2022 Younghoon Jeong, Juhyun Oh, Jaimeen Ahn, Jongwon Lee, Jihyung Moon, Sungjoon Park, Alice Oh

Recent directions for offensive language detection are hierarchical modeling, identifying the type and the target of offensive language, and interpretability with offensive span annotation and prediction.

Classification

Translating Hanja Historical Documents to Contemporary Korean and English

no code implementations20 May 2022 Juhee Son, Jiho Jin, Haneul Yoo, JinYeong Bak, Kyunghyun Cho, Alice Oh

Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English.

Machine Translation Translation

Two-Step Question Retrieval for Open-Domain QA

1 code implementation Findings (ACL) 2022 Yeon Seonwoo, Juhee Son, Jiho Jin, Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh

These models have shown a significant increase in inference speed, but at the cost of lower QA performance compared to the retriever-reader models.

Computational Efficiency Retrieval +1

Translating Subgraphs to Nodes Makes Simple GNNs Strong and Efficient for Subgraph Representation Learning

1 code implementation9 Apr 2022 Dongkwan Kim, Alice Oh

Subgraph representation learning has emerged as an important problem, but it is by default approached with specialized graph neural networks on a large global graph.

Representation Learning Translation

Emergent Communication under Varying Sizes and Connectivities

no code implementations NeurIPS 2021 Jooyeon Kim, Alice Oh

Just as we humans have succeeded in creating a shared language that allows us to interact within a large group, can the emergent communication within an artificial group converge to a shared, agreed language?

Learning Representations of Partial Subgraphs by Subgraph InfoMax

no code implementations29 Sep 2021 Dongkwan Kim, Jiho Jin, Jaimeen Ahn, Alice Oh

Subgraphs are important substructures of graphs, but learning their representations has not been studied well.

Learning Bill Similarity with Annotated and Augmented Corpora of Bills

1 code implementation EMNLP 2021 Jiseon Kim, Elden Griggs, In Song Kim, Alice Oh

Despite the significance of bill-to-bill linkages for understanding the legislative process, existing approaches fail to address semantic similarities across bills, let alone reordering or paraphrasing which are prevalent in legal document writing.

Mitigating Language-Dependent Ethnic Bias in BERT

1 code implementation EMNLP 2021 Jaimeen Ahn, Alice Oh

Which of the two methods works better depends on the amount of NLP resources available for that language.

Word Alignment

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

1 code implementation EMNLP 2021 Seonghyeon Ye, Jiseon Kim, Alice Oh

We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.

Continual Pretraining Contrastive Learning +2

Weakly Supervised Pre-Training for Multi-Hop Retriever

1 code implementation Findings (ACL) 2021 Yeon Seonwoo, Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh

In multi-hop QA, answering complex questions entails iterative document retrieval for finding the missing entity of the question.

Retrieval

Context-Aware Answer Extraction in Question Answering

1 code implementation EMNLP 2020 Yeon Seonwoo, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh

With experiments on reading comprehension, we show that BLANC outperforms the state-of-the-art QA models, and the performance gap increases as the number of answer text occurrences increases.

Multi-Task Learning Question Answering +1

Speaker Sensitive Response Evaluation Model

1 code implementation ACL 2020 JinYeong Bak, Alice Oh

We provide our code and the learned parameters so that they can be used for automatic evaluation of dialogue response generation models.

Response Generation

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

1 code implementation8 May 2020 Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, Uichin Lee

Therefore, studying emotions in the context of social interactions requires a novel dataset, and K-EmoCon is such a multimodal dataset with comprehensive annotations of continuous emotions during naturalistic conversations.

EEG Emotion Recognition

Dimensional Emotion Detection from Categorical Emotion

1 code implementation EMNLP 2021 Sungjoon Park, Jiseon Kim, Seonghyeon Ye, Jaeyeol Jeon, Hee Young Park, Alice Oh

We present a model to predict fine-grained emotions along the continuous dimensions of valence, arousal, and dominance (VAD) with a corpus with categorical emotion annotations.

Emotion Classification Sentence

Variational Hierarchical User-based Conversation Model

1 code implementation IJCNLP 2019 JinYeong Bak, Alice Oh

To overcome this limitation, we propose a new model with a stochastic variable designed to capture the speaker information and deliver it to the conversational context.

Response Generation

Additive Compositionality of Word Vectors

no code implementations WS 2019 Yeon Seonwoo, Sungjoon Park, Dongkwan Kim, Alice Oh

Additive compositionality of word embedding models has been studied from empirical and theoretical perspectives.

Sentence Sentence Similarity +1

Emergence of Collective Policies Inside Simulations with Biased Representations

no code implementations25 Sep 2019 Jooyeon Kim, Alice Oh

We consider a setting where biases are involved when agents internalise an environment.

Conversation Model Fine-Tuning for Classifying Client Utterances in Counseling Dialogues

no code implementations NAACL 2019 Sungjoon Park, Donghyun Kim, Alice Oh

A dataset of those interactions can be used to learn to automatically classify the client utterances into categories that help counselors in diagnosing client status and predicting counseling outcome.

Language Modelling

Homogeneity-Based Transmissive Process to Model True and False News in Social Networks

1 code implementation16 Nov 2018 Jooyeon Kim, Dongkwan Kim, Alice Oh

An overwhelming number of true and false news stories are posted and shared in social networks, and users diffuse the stories based on multiple factors.

Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty

no code implementations EMNLP 2018 JinYeong Bak, Alice Oh

Styles of leaders when they make decisions in groups vary, and the different styles affect the performance of the group.

Decision Making

Subword-level Word Vector Representations for Korean

1 code implementation ACL 2018 Sungjoon Park, Jeongmin Byun, Sion Baek, Yongseok Cho, Alice Oh

The results show that our simple method outperforms word2vec and character-level Skip-Grams on semantic and syntactic similarity and analogy tasks and contributes positively toward downstream NLP tasks such as sentiment analysis.

Document Classification Language Modelling +3

Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation

1 code implementation27 Nov 2017 Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schoelkopf, Manuel Gomez-Rodriguez

Online social networking sites are experimenting with the following crowd-powered procedure to reduce the spread of fake news and misinformation: whenever a user is exposed to a story through her feed, she can flag the story as misinformation and, if the story receives enough flags, it is sent to a trusted third party for fact checking.

Fact Checking Misinformation +1

Rotated Word Vector Representations and their Interpretability

1 code implementation EMNLP 2017 Sungjoon Park, JinYeong Bak, Alice Oh

We apply several rotation algorithms to the vector representation of words to improve the interpretability.

Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

no code implementations TACL 2017 Jooyeon Kim, Dongwoo Kim, Alice Oh

Second, it models each author's influence on citations of a paper based on the topics of the cited papers, as well as the citing papers.

Understanding Editing Behaviors in Multilingual Wikipedia

no code implementations28 Aug 2015 Suin Kim, Sungjoon Park, Scott A. Hale, Sooyoung Kim, Jeongmin Byun, Alice Oh

We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia.

Hierarchical Dirichlet Scaling Process

no code implementations22 Mar 2014 Dongwoo Kim, Alice Oh

We present the \textit{hierarchical Dirichlet scaling process} (HDSP), a Bayesian nonparametric mixed membership model.

Cannot find the paper you are looking for? You can Submit a new open access paper.