Search Results for author: Alice Oh

Found 44 papers, 25 papers with code

Virtual Knowledge Graph Construction for Zero-Shot Domain-Specific Document Retrieval

1 code implementation COLING 2022 Yeon Seonwoo, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Alice Oh

We conduct three experiments 1) domain-specific document retrieval, 2) comparison of our virtual knowledge graph construction method with previous approaches, and 3) ablation study on each component of our virtual knowledge graph.

Domain Adaptation graph construction +2

Suicidal Risk Detection for Military Personnel

no code implementations EMNLP 2020 Sungjoon Park, Kiwoong Park, Jaimeen Ahn, Alice Oh

We analyze social media for detecting the suicidal risk of military personnel, which is especially crucial for countries with compulsory military service such as the Republic of Korea.

Ethics

Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT

no code implementations NAACL (GeBNLP) 2022 Jaimeen Ahn, Hwaran Lee, JinHwa Kim, Alice Oh

Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model. However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model. This paper studies what causes gender bias to increase after the knowledge distillation process. Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation. By doing so, we can significantly reduce the gender bias amplification after knowledge distillation. We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.

Knowledge Distillation

Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation

1 code implementation25 Oct 2022 Soyoung Yoon, Sungjoon Park, Gyuwan Kim, Junhee Cho, Kihyo Park, Gyutae Kim, Minjoon Seo, Alice Oh

We show that the model trained with our datasets significantly outperforms the currently used statistical Korean GEC system (Hanspell) on a wider range of error types, demonstrating the diversity and usefulness of the datasets.

Grammatical Error Correction

IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension

1 code implementation25 Oct 2022 Rifki Afina Putri, Alice Oh

Machine Reading Comprehension (MRC) has become one of the essential tasks in Natural Language Understanding (NLU) as it is often included in several NLU benchmarks (Liang et al., 2020; Wilie et al., 2020).

Machine Reading Comprehension Natural Language Understanding +1

Rethinking Annotation: Can Language Learners Contribute?

no code implementations13 Oct 2022 Haneul Yoo, Rifki Afina Putri, Changyoon Lee, Youngin Lee, So-Yeon Ahn, Dongyeop Kang, Alice Oh

The implication of our findings is that broadening the annotation task to include language learners can open up the opportunity to build benchmark datasets for languages for which it is difficult to recruit native speakers.

Machine Reading Comprehension named-entity-recognition +4

HUE: Pretrained Model and Dataset for Understanding Hanja Documents of Ancient Korea

1 code implementation Findings (NAACL) 2022 Haneul Yoo, Jiho Jin, Juhee Son, JinYeong Bak, Kyunghyun Cho, Alice Oh

Historical records in Korea before the 20th century were primarily written in Hanja, an extinct language based on Chinese characters and not understood by modern Korean or Chinese speakers.

named-entity-recognition Named Entity Recognition +3

Ranking-Enhanced Unsupervised Sentence Representation Learning

1 code implementation9 Sep 2022 Yeon Seonwoo, Guoyin Wang, Changmin Seo, Sajal Choudhary, Jiwei Li, Xiang Li, Puyang Xu, Sunghyun Park, Alice Oh

In this work, we show that the semantic meaning of a sentence is also determined by nearest-neighbor sentences that are similar to the input sentence.

Contrastive Learning Data Augmentation +4

Models and Benchmarks for Representation Learning of Partially Observed Subgraphs

1 code implementation1 Sep 2022 Dongkwan Kim, Jiho Jin, Jaimeen Ahn, Alice Oh

Subgraphs are rich substructures in graphs, and their nodes and edges can be partially observed in real-world tasks.

Representation Learning

KOLD: Korean Offensive Language Dataset

1 code implementation23 May 2022 Younghoon Jeong, Juhyun Oh, Jaimeen Ahn, Jongwon Lee, Jihyung Moon, Sungjoon Park, Alice Oh

Recent directions for offensive language detection are hierarchical modeling, identifying the type and the target of offensive language, and interpretability with offensive span annotation and prediction.

Classification

Translating Hanja Historical Documents to Contemporary Korean and English

no code implementations20 May 2022 Juhee Son, Jiho Jin, Haneul Yoo, JinYeong Bak, Kyunghyun Cho, Alice Oh

Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English.

Machine Translation Translation

Two-Step Question Retrieval for Open-Domain QA

1 code implementation Findings (ACL) 2022 Yeon Seonwoo, Juhee Son, Jiho Jin, Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh

These models have shown a significant increase in inference speed, but at the cost of lower QA performance compared to the retriever-reader models.

Retrieval Vocal Bursts Valence Prediction

Emergent Communication under Varying Sizes and Connectivities

no code implementations NeurIPS 2021 Jooyeon Kim, Alice Oh

Just as we humans have succeeded in creating a shared language that allows us to interact within a large group, can the emergent communication within an artificial group converge to a shared, agreed language?

Learning Representations of Partial Subgraphs by Subgraph InfoMax

no code implementations29 Sep 2021 Dongkwan Kim, Jiho Jin, Jaimeen Ahn, Alice Oh

Subgraphs are important substructures of graphs, but learning their representations has not been studied well.

Learning Bill Similarity with Annotated and Augmented Corpora of Bills

1 code implementation EMNLP 2021 Jiseon Kim, Elden Griggs, In Song Kim, Alice Oh

Despite the significance of bill-to-bill linkages for understanding the legislative process, existing approaches fail to address semantic similarities across bills, let alone reordering or paraphrasing which are prevalent in legal document writing.

Mitigating Language-Dependent Ethnic Bias in BERT

1 code implementation EMNLP 2021 Jaimeen Ahn, Alice Oh

Which of the two methods works better depends on the amount of NLP resources available for that language.

Word Alignment

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

1 code implementation EMNLP 2021 Seonghyeon Ye, Jiseon Kim, Alice Oh

We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.

Continual Pretraining Contrastive Learning +1

Weakly Supervised Pre-Training for Multi-Hop Retriever

1 code implementation Findings (ACL) 2021 Yeon Seonwoo, Sang-Woo Lee, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh

In multi-hop QA, answering complex questions entails iterative document retrieval for finding the missing entity of the question.

Retrieval

Context-Aware Answer Extraction in Question Answering

1 code implementation EMNLP 2020 Yeon Seonwoo, Ji-Hoon Kim, Jung-Woo Ha, Alice Oh

With experiments on reading comprehension, we show that BLANC outperforms the state-of-the-art QA models, and the performance gap increases as the number of answer text occurrences increases.

Multi-Task Learning Question Answering +1

Speaker Sensitive Response Evaluation Model

1 code implementation ACL 2020 JinYeong Bak, Alice Oh

We provide our code and the learned parameters so that they can be used for automatic evaluation of dialogue response generation models.

Response Generation

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

1 code implementation8 May 2020 Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, Uichin Lee

Therefore, studying emotions in the context of social interactions requires a novel dataset, and K-EmoCon is such a multimodal dataset with comprehensive annotations of continuous emotions during naturalistic conversations.

Electroencephalogram (EEG) Emotion Recognition

Dimensional Emotion Detection from Categorical Emotion

1 code implementation EMNLP 2021 Sungjoon Park, Jiseon Kim, Seonghyeon Ye, Jaeyeol Jeon, Hee Young Park, Alice Oh

We present a model to predict fine-grained emotions along the continuous dimensions of valence, arousal, and dominance (VAD) with a corpus with categorical emotion annotations.

Emotion Classification

Variational Hierarchical User-based Conversation Model

1 code implementation IJCNLP 2019 JinYeong Bak, Alice Oh

To overcome this limitation, we propose a new model with a stochastic variable designed to capture the speaker information and deliver it to the conversational context.

Response Generation

Additive Compositionality of Word Vectors

no code implementations WS 2019 Yeon Seonwoo, Sungjoon Park, Dongkwan Kim, Alice Oh

Additive compositionality of word embedding models has been studied from empirical and theoretical perspectives.

Sentence Similarity Word Similarity

Emergence of Collective Policies Inside Simulations with Biased Representations

no code implementations25 Sep 2019 Jooyeon Kim, Alice Oh

We consider a setting where biases are involved when agents internalise an environment.

Conversation Model Fine-Tuning for Classifying Client Utterances in Counseling Dialogues

no code implementations NAACL 2019 Sungjoon Park, Donghyun Kim, Alice Oh

A dataset of those interactions can be used to learn to automatically classify the client utterances into categories that help counselors in diagnosing client status and predicting counseling outcome.

Language Modelling

Homogeneity-Based Transmissive Process to Model True and False News in Social Networks

1 code implementation16 Nov 2018 Jooyeon Kim, Dongkwan Kim, Alice Oh

An overwhelming number of true and false news stories are posted and shared in social networks, and users diffuse the stories based on multiple factors.

Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty

no code implementations EMNLP 2018 JinYeong Bak, Alice Oh

Styles of leaders when they make decisions in groups vary, and the different styles affect the performance of the group.

Decision Making

Subword-level Word Vector Representations for Korean

1 code implementation ACL 2018 Sungjoon Park, Jeongmin Byun, Sion Baek, Yongseok Cho, Alice Oh

The results show that our simple method outperforms word2vec and character-level Skip-Grams on semantic and syntactic similarity and analogy tasks and contributes positively toward downstream NLP tasks such as sentiment analysis.

Document Classification Language Modelling +3

Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation

1 code implementation27 Nov 2017 Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schoelkopf, Manuel Gomez-Rodriguez

Online social networking sites are experimenting with the following crowd-powered procedure to reduce the spread of fake news and misinformation: whenever a user is exposed to a story through her feed, she can flag the story as misinformation and, if the story receives enough flags, it is sent to a trusted third party for fact checking.

Fact Checking Misinformation +1

Rotated Word Vector Representations and their Interpretability

1 code implementation EMNLP 2017 Sungjoon Park, JinYeong Bak, Alice Oh

We apply several rotation algorithms to the vector representation of words to improve the interpretability.

Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

no code implementations TACL 2017 Jooyeon Kim, Dongwoo Kim, Alice Oh

Second, it models each author's influence on citations of a paper based on the topics of the cited papers, as well as the citing papers.

Understanding Editing Behaviors in Multilingual Wikipedia

no code implementations28 Aug 2015 Suin Kim, Sungjoon Park, Scott A. Hale, Sooyoung Kim, Jeongmin Byun, Alice Oh

We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia.

Hierarchical Dirichlet Scaling Process

no code implementations22 Mar 2014 Dongwoo Kim, Alice Oh

We present the \textit{hierarchical Dirichlet scaling process} (HDSP), a Bayesian nonparametric mixed membership model.

Cannot find the paper you are looking for? You can Submit a new open access paper.