Search Results for author: Taeuk Kim

Found 37 papers, 16 papers with code

RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval

1 code implementation10 Jun 2025 Minhae Oh, Jeonghye Kim, Nakyung Lee, Donggeon Seo, Taeuk Kim, Jungwoo Lee

We analyze that unlike other baselines, RAISE retrieves documents that are not only similar in terms of the domain knowledge, but also documents logically more relevant.

Problem Decomposition Retrieval

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs

no code implementations22 May 2025 Jisu Kim, Youngwoo Shin, Uiji Hwang, Jihun Choi, Richeng Xuan, Taeuk Kim

Idioms have long posed a challenge due to their unique linguistic properties, which set them apart from other common expressions.

Machine Translation Memorization +1

KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking

1 code implementation21 Apr 2025 Juyeon Kim, Geon Lee, Taeuk Kim, Kijung Shin

However, most existing MEL methods overlook the rich structural information available in the form of knowledge-graph (KG) triples.

Entity Linking Knowledge Graphs +1

When to Speak, When to Abstain: Contrastive Decoding with Abstention

no code implementations17 Dec 2024 Hyuhng Joon Kim, Youna Kim, Sang-goo Lee, Taeuk Kim

Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks by leveraging pre-trained (i. e., parametric) and external (i. e., contextual) knowledge.

Hallucination Question Answering

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning

no code implementations17 Dec 2024 Seunghee Kim, Changhyeon Kim, Taeuk Kim

Real-world decision-making often requires integrating and reasoning over information from multiple modalities.

Information Retrieval

Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts

no code implementations2 Aug 2024 Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge the gap between external knowledge and the LLMs' parametric knowledge.

Open-Domain Question Answering Retrieval +1

Subgraph-Aware Training of Language Models for Knowledge Graph Completion Using Structure-Aware Contrastive Learning

1 code implementation17 Jul 2024 Youmin Ko, Hyemin Yang, Taeuk Kim, Hyunjoon Kim

To this end, we propose a Subgraph-Aware Training framework for KGC (SATKGC) with two ideas: (i) subgraph-aware mini-batching to encourage hard negative sampling and to mitigate an imbalance in the frequency of entity occurrences during training, and (ii) new contrastive learning to focus more on harder in-batch negative triples and harder positive triples in terms of the structural properties of the knowledge graph.

Contrastive Learning Inductive Bias

Revisiting the Impact of Pursuing Modularity for Code Generation

1 code implementation16 Jul 2024 Deokyeong Kang, Ki Jung Seo, Taeuk Kim

Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development.

Code Generation

Aligning Language Models to Explicitly Handle Ambiguity

1 code implementation18 Apr 2024 Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i. e., perceived ambiguity).

Language Modeling Language Modelling +1

BlendX: Complex Multi-Intent Detection with Blended Patterns

1 code implementation27 Mar 2024 Yejin Yoon, Jungyeon Lee, Kangsan Kim, Chanhee Park, Taeuk Kim

Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent.

Diversity Intent Detection

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

1 code implementation14 Mar 2024 Young Hyun Yoo, Jii Cha, Changhyeon Kim, Taeuk Kim

While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives.

Computational Efficiency Contrastive Learning +4

Analysis of Multi-Source Language Training in Cross-Lingual Transfer

no code implementations21 Feb 2024 Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim

The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition.

Cross-Lingual Transfer

X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity

no code implementations26 Oct 2023 Taejun Yun, Jinhyeon Kim, Deokyeong Kang, Seong Hoon Lim, Jihoon Kim, Taeuk Kim

Cross-lingual transfer (XLT) is an emergent ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process.

Cross-Lingual Transfer

Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP

1 code implementation23 Oct 2023 Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs.

Universal Domain Adaptation

Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

1 code implementation20 Oct 2022 Hyunsoo Cho, Choonghyun Park, Jaewook Kang, Kang Min Yoo, Taeuk Kim, Sang-goo Lee

Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience.

Contrastive Learning intent-classification +6

Revisiting the Practical Effectiveness of Constituency Parse Extraction from Pre-trained Language Models

no code implementations COLING 2022 Taeuk Kim

Constituency Parse Extraction from Pre-trained Language Models (CPE-PLM) is a recent paradigm that attempts to induce constituency parse trees relying only on the internal knowledge of pre-trained language models.

In-Context Learning

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

no code implementations16 Jun 2022 Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee

Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task.

In-Context Learning text-classification +2

HYU at SemEval-2022 Task 2: Effective Idiomaticity Detection with Consideration at Different Levels of Contextualization

no code implementations SemEval (NAACL) 2022 Youngju Joung, Taeuk Kim

We propose a unified framework that enables us to consider various aspects of contextualization at different levels to better identify the idiomaticity of multi-word expressions.

Sentence Task 2

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

no code implementations25 May 2022 Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Taeuk Kim

Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive.

In-Context Learning Language Modeling +1

Self-Guided Contrastive Learning for BERT Sentence Representations

1 code implementation ACL 2021 Taeuk Kim, Kang Min Yoo, Sang-goo Lee

In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations.

Contrastive Learning Data Augmentation +2

IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?

no code implementations SEMEVAL 2020 Jaeyoul Shin, Taeuk Kim, Sang-goo Lee

We propose a novel method that enables us to determine words that deserve to be emphasized from written text in visual media, relying only on the information from the self-attention distributions of pre-trained language models (PLMs).

Language Modeling Language Modelling

Multilingual Chart-based Constituency Parse Extraction from Pre-trained Language Models

1 code implementation Findings (EMNLP) 2021 Taeuk Kim, Bowen Li, Sang-goo Lee

As it has been unveiled that pre-trained language models (PLMs) are to some extent capable of recognizing syntactic concepts in natural language, much effort has been made to develop a method for extracting complete (binary) parses from PLMs without training separate parsers.

Constituency Parsing Cross-Lingual Transfer

Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction

1 code implementation ICLR 2020 Taeuk Kim, Jihun Choi, Daniel Edmiston, Sang-goo Lee

With the recent success and popularity of pre-trained language models (LMs) in natural language processing, there has been a rise in efforts to understand their inner workings.

Summary Level Training of Sentence Rewriting for Abstractive Summarization

no code implementations WS 2019 Sanghwan Bae, Taeuk Kim, Jihoon Kim, Sang-goo Lee

As an attempt to combine extractive and abstractive summarization, Sentence Rewriting models adopt the strategy of extracting salient sentences from a document first and then paraphrasing the selected ones to generate a summary.

Abstractive Text Summarization Extractive Text Summarization +4

Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja

3 code implementations IJCNLP 2019 Kang Min Yoo, Taeuk Kim, Sang-goo Lee

We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i. e. Hanja).

Cross-Lingual Transfer Headline Generation +1

Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations

2 code implementations7 Sep 2018 Taeuk Kim, Jihun Choi, Daniel Edmiston, Sanghwan Bae, Sang-goo Lee

Most existing recursive neural network (RvNN) architectures utilize only the structure of parse trees, ignoring syntactic tags which are provided as by-products of parsing.

Natural Language Inference Sentence +2

Element-wise Bilinear Interaction for Sentence Matching

no code implementations SEMEVAL 2018 Jihun Choi, Taeuk Kim, Sang-goo Lee

When we build a neural network model predicting the relationship between two sentences, the most general and intuitive approach is to use a Siamese architecture, where the sentence vectors obtained from a shared encoder is given as input to a classifier.

Natural Language Inference Paraphrase Identification +1

A Syllable-based Technique for Word Embeddings of Korean Words

no code implementations WS 2017 Sanghyuk Choi, Taeuk Kim, Jinseok Seol, Sang-goo Lee

Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation.

Machine Translation named-entity-recognition +4

Cannot find the paper you are looking for? You can Submit a new open access paper.