no code implementations • 18 Apr 2024 • Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim
However, conversational agents built upon even the most recent large language models (LLMs) face challenges in processing ambiguous inputs, primarily due to the following two hurdles: (1) LLMs are not directly trained to handle inputs that are too ambiguous to be properly managed; (2) the degree of ambiguity in an input can vary according to the intrinsic knowledge of the LLMs, which is difficult to investigate.
1 code implementation • 27 Mar 2024 • Yejin Yoon, Jungyeon Lee, Kangsan Kim, Chanhee Park, Taeuk Kim
Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent.
no code implementations • 14 Mar 2024 • Young Hyun Yoo, Jii Cha, Changhyeon Kim, Taeuk Kim
While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives.
no code implementations • 21 Feb 2024 • Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition.
no code implementations • 26 Oct 2023 • Taejun Yun, Jinhyeon Kim, Deokyeong Kang, Seong Hoon Lim, Jihoon Kim, Taeuk Kim
Cross-lingual transfer (XLT) is an emergent ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process.
1 code implementation • 23 Oct 2023 • Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim
When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs.
no code implementations • 21 Dec 2022 • Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim
Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning.
1 code implementation • 20 Oct 2022 • Hyunsoo Cho, Choonghyun Park, Jaewook Kang, Kang Min Yoo, Taeuk Kim, Sang-goo Lee
Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience.
no code implementations • COLING 2022 • Taeuk Kim
Constituency Parse Extraction from Pre-trained Language Models (CPE-PLM) is a recent paradigm that attempts to induce constituency parse trees relying only on the internal knowledge of pre-trained language models.
no code implementations • 16 Jun 2022 • Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee
Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task.
no code implementations • SemEval (NAACL) 2022 • Youngju Joung, Taeuk Kim
We propose a unified framework that enables us to consider various aspects of contextualization at different levels to better identify the idiomaticity of multi-word expressions.
no code implementations • 25 May 2022 • Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Taeuk Kim
Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive.
1 code implementation • ACL 2021 • Taeuk Kim, Kang Min Yoo, Sang-goo Lee
In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller
Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads.
no code implementations • SEMEVAL 2020 • Jaeyoul Shin, Taeuk Kim, Sang-goo Lee
We propose a novel method that enables us to determine words that deserve to be emphasized from written text in visual media, relying only on the information from the self-attention distributions of pre-trained language models (PLMs).
1 code implementation • Findings (EMNLP) 2021 • Taeuk Kim, Bowen Li, Sang-goo Lee
As it has been unveiled that pre-trained language models (PLMs) are to some extent capable of recognizing syntactic concepts in natural language, much effort has been made to develop a method for extracting complete (binary) parses from PLMs without training separate parsers.
1 code implementation • ICLR 2020 • Taeuk Kim, Jihun Choi, Daniel Edmiston, Sang-goo Lee
With the recent success and popularity of pre-trained language models (LMs) in natural language processing, there has been a rise in efforts to understand their inner workings.
no code implementations • WS 2019 • Sanghwan Bae, Taeuk Kim, Jihoon Kim, Sang-goo Lee
As an attempt to combine extractive and abstractive summarization, Sentence Rewriting models adopt the strategy of extracting salient sentences from a document first and then paraphrasing the selected ones to generate a summary.
Ranked #5 on Extractive Text Summarization on CNN / Daily Mail
Abstractive Text Summarization Extractive Text Summarization +3
3 code implementations • IJCNLP 2019 • Kang Min Yoo, Taeuk Kim, Sang-goo Lee
We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i. e. Hanja).
no code implementations • ACL 2019 • Jihun Choi, Taeuk Kim, Sang-goo Lee
We present a latent variable model for predicting the relationship between a pair of text sequences.
2 code implementations • 7 Sep 2018 • Taeuk Kim, Jihun Choi, Daniel Edmiston, Sanghwan Bae, Sang-goo Lee
Most existing recursive neural network (RvNN) architectures utilize only the structure of parse trees, ignoring syntactic tags which are provided as by-products of parsing.
no code implementations • 7 Sep 2018 • Jihun Choi, Taeuk Kim, Sang-goo Lee
We propose a method of stacking multiple long short-term memory (LSTM) layers for modeling sentences.
Ranked #10 on Sentiment Analysis on SST-5 Fine-grained classification
no code implementations • SEMEVAL 2018 • Jihun Choi, Taeuk Kim, Sang-goo Lee
When we build a neural network model predicting the relationship between two sentences, the most general and intuitive approach is to use a Siamese architecture, where the sentence vectors obtained from a shared encoder is given as input to a classifier.
1 code implementation • SEMEVAL 2018 • Taeuk Kim, Jihun Choi, Sang-goo Lee
We present a novel neural architecture for the Argument Reasoning Comprehension task of SemEval 2018.
1 code implementation • SEMEVAL 2018 • Taeuk Kim, Jihun Choi, Sang-goo Lee
We present a novel neural architecture for the Argument Reasoning Comprehension task of SemEval 2018.
no code implementations • WS 2017 • Sanghyuk Choi, Taeuk Kim, Jinseok Seol, Sang-goo Lee
Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation.