1 code implementation • 10 Jun 2025 • Minhae Oh, Jeonghye Kim, Nakyung Lee, Donggeon Seo, Taeuk Kim, Jungwoo Lee
We analyze that unlike other baselines, RAISE retrieves documents that are not only similar in terms of the domain knowledge, but also documents logically more relevant.
no code implementations • 22 May 2025 • Jisu Kim, Youngwoo Shin, Uiji Hwang, Jihun Choi, Richeng Xuan, Taeuk Kim
Idioms have long posed a challenge due to their unique linguistic properties, which set them apart from other common expressions.
no code implementations • 22 May 2025 • Hwiyeong Lee, Uiji Hwang, Hyelim Lim, Taeuk Kim
Large language models often retain unintended content, prompting growing interest in knowledge unlearning.
1 code implementation • 21 Apr 2025 • Juyeon Kim, Geon Lee, Taeuk Kim, Kijung Shin
However, most existing MEL methods overlook the rich structural information available in the form of knowledge-graph (KG) triples.
no code implementations • 19 Feb 2025 • Youna Kim, Hyuhng Joon Kim, Minjoon Choi, Sungmin Cho, Hyunsoo Cho, Sang-goo Lee, Taeuk Kim
Language models often benefit from external knowledge beyond parametric knowledge.
no code implementations • 17 Dec 2024 • Hyuhng Joon Kim, Youna Kim, Sang-goo Lee, Taeuk Kim
Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks by leveraging pre-trained (i. e., parametric) and external (i. e., contextual) knowledge.
no code implementations • 17 Dec 2024 • Seunghee Kim, Changhyeon Kim, Taeuk Kim
Real-world decision-making often requires integrating and reasoning over information from multiple modalities.
no code implementations • 2 Aug 2024 • Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim
When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge the gap between external knowledge and the LLMs' parametric knowledge.
1 code implementation • 17 Jul 2024 • Youmin Ko, Hyemin Yang, Taeuk Kim, Hyunjoon Kim
To this end, we propose a Subgraph-Aware Training framework for KGC (SATKGC) with two ideas: (i) subgraph-aware mini-batching to encourage hard negative sampling and to mitigate an imbalance in the frequency of entity occurrences during training, and (ii) new contrastive learning to focus more on harder in-batch negative triples and harder positive triples in terms of the structural properties of the knowledge graph.
1 code implementation • 16 Jul 2024 • Deokyeong Kang, Ki Jung Seo, Taeuk Kim
Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development.
no code implementations • 24 Jun 2024 • Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo
Based on the findings, we further train the classifier with the dataset augmented by FAILOpt prompt.
1 code implementation • 18 Apr 2024 • Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim
To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i. e., perceived ambiguity).
1 code implementation • 27 Mar 2024 • Yejin Yoon, Jungyeon Lee, Kangsan Kim, Chanhee Park, Taeuk Kim
Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent.
1 code implementation • 14 Mar 2024 • Young Hyun Yoo, Jii Cha, Changhyeon Kim, Taeuk Kim
While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives.
no code implementations • 21 Feb 2024 • Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition.
no code implementations • 26 Oct 2023 • Taejun Yun, Jinhyeon Kim, Deokyeong Kang, Seong Hoon Lim, Jihoon Kim, Taeuk Kim
Cross-lingual transfer (XLT) is an emergent ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process.
1 code implementation • 23 Oct 2023 • Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim
When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs.
no code implementations • 21 Dec 2022 • Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim
Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning.
1 code implementation • 20 Oct 2022 • Hyunsoo Cho, Choonghyun Park, Jaewook Kang, Kang Min Yoo, Taeuk Kim, Sang-goo Lee
Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience.
no code implementations • COLING 2022 • Taeuk Kim
Constituency Parse Extraction from Pre-trained Language Models (CPE-PLM) is a recent paradigm that attempts to induce constituency parse trees relying only on the internal knowledge of pre-trained language models.
no code implementations • 16 Jun 2022 • Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee
Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task.
no code implementations • SemEval (NAACL) 2022 • Youngju Joung, Taeuk Kim
We propose a unified framework that enables us to consider various aspects of contextualization at different levels to better identify the idiomaticity of multi-word expressions.
no code implementations • 25 May 2022 • Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Taeuk Kim
Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive.
1 code implementation • ACL 2021 • Taeuk Kim, Kang Min Yoo, Sang-goo Lee
In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller
Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads.
no code implementations • SEMEVAL 2020 • Jaeyoul Shin, Taeuk Kim, Sang-goo Lee
We propose a novel method that enables us to determine words that deserve to be emphasized from written text in visual media, relying only on the information from the self-attention distributions of pre-trained language models (PLMs).
1 code implementation • Findings (EMNLP) 2021 • Taeuk Kim, Bowen Li, Sang-goo Lee
As it has been unveiled that pre-trained language models (PLMs) are to some extent capable of recognizing syntactic concepts in natural language, much effort has been made to develop a method for extracting complete (binary) parses from PLMs without training separate parsers.
1 code implementation • ICLR 2020 • Taeuk Kim, Jihun Choi, Daniel Edmiston, Sang-goo Lee
With the recent success and popularity of pre-trained language models (LMs) in natural language processing, there has been a rise in efforts to understand their inner workings.
no code implementations • WS 2019 • Sanghwan Bae, Taeuk Kim, Jihoon Kim, Sang-goo Lee
As an attempt to combine extractive and abstractive summarization, Sentence Rewriting models adopt the strategy of extracting salient sentences from a document first and then paraphrasing the selected ones to generate a summary.
Ranked #6 on
Extractive Text Summarization
on CNN / Daily Mail
Abstractive Text Summarization
Extractive Text Summarization
+4
3 code implementations • IJCNLP 2019 • Kang Min Yoo, Taeuk Kim, Sang-goo Lee
We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i. e. Hanja).
no code implementations • ACL 2019 • Jihun Choi, Taeuk Kim, Sang-goo Lee
We present a latent variable model for predicting the relationship between a pair of text sequences.
no code implementations • 7 Sep 2018 • Jihun Choi, Taeuk Kim, Sang-goo Lee
We propose a method of stacking multiple long short-term memory (LSTM) layers for modeling sentences.
Ranked #11 on
Sentiment Analysis
on SST-5 Fine-grained classification
2 code implementations • 7 Sep 2018 • Taeuk Kim, Jihun Choi, Daniel Edmiston, Sanghwan Bae, Sang-goo Lee
Most existing recursive neural network (RvNN) architectures utilize only the structure of parse trees, ignoring syntactic tags which are provided as by-products of parsing.
1 code implementation • SEMEVAL 2018 • Taeuk Kim, Jihun Choi, Sang-goo Lee
We present a novel neural architecture for the Argument Reasoning Comprehension task of SemEval 2018.
no code implementations • SEMEVAL 2018 • Jihun Choi, Taeuk Kim, Sang-goo Lee
When we build a neural network model predicting the relationship between two sentences, the most general and intuitive approach is to use a Siamese architecture, where the sentence vectors obtained from a shared encoder is given as input to a classifier.
1 code implementation • SEMEVAL 2018 • Taeuk Kim, Jihun Choi, Sang-goo Lee
We present a novel neural architecture for the Argument Reasoning Comprehension task of SemEval 2018.
no code implementations • WS 2017 • Sanghyuk Choi, Taeuk Kim, Jinseok Seol, Sang-goo Lee
Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation.