Search Results for author: Heuiseok Lim

Found 47 papers, 11 papers with code

Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?

no code implementations • CCGPK (COLING) 2022 • SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim

As a result of the experiment, we present that the FoCus model could not correctly blend the knowledge according to the input dialogue and that the dataset design is unsuitable for the multi-turn conversation.

Dialogue Generation Question Answering

Paper
Add Code

Dealing with the Paradox of Quality Estimation

no code implementations • MTSummit 2021 • Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence.

Machine Translation Sentence +1

Paper
Add Code

Two Heads are Better than One? Verification of Ensemble Effect in Neural Machine Translation

no code implementations • EMNLP (insights) 2021 • Chanjun Park, Sungjin Park, Seolhwa Lee, Taesun Whang, Heuiseok Lim

In the field of natural language processing, ensembles are broadly known to be effective in improving performance.

Machine Translation NMT +1

Paper
Add Code

A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

no code implementations • Findings (NAACL) 2022 • Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim

However, Korean pretrained language models still struggle to generate a short sentence with a given condition based on compositionality and commonsense reasoning (i. e., generative commonsense reasoning).

Language Modelling Natural Language Understanding +2

Paper
Add Code

Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing

no code implementations • LREC 2022 • Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jungseob Lee, Sugyeong Eo, Heuiseok Lim

This study has several limitations, considering the data acquisition, because there is no official dataset for most language pairs.

Automatic Post-Editing Synthetic Data Generation +1

Paper
Add Code

BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text

no code implementations • ACL (WAT) 2021 • Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim

Automatic speech recognition (ASR) is arguably the most critical component of such systems, as errors in speech recognition propagate to the downstream components and drastically degrade the user experience.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Don’t Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

1 code implementation • COLING 2022 • Dongsuk Oh, Yejin Kim, Hodong Lee, H. Howie Huang, Heuiseok Lim

Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation.

Contrastive Learning Language Modelling +3

Paper
Code

FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue

no code implementations • LREC 2022 • Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim

We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages, by employing a humanoid robot NAO and various deep learning models.

Paper
Add Code

Priming Ancient Korean Neural Machine Translation

no code implementations • LREC 2022 • Chanjun Park, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim

In recent years, there has been an increasing need for the restoration and translation of historical languages.

Machine Translation NMT +1

Paper
Add Code

Capturing Speaker Incorrectness: Speaker-Focused Post-Correction for Abstractive Dialogue Summarization

no code implementations • EMNLP (newsum) 2021 • Dongyub Lee, Jungwoo Lim, Taesun Whang, Chanhee Lee, Seungwoo Cho, Mingun Park, Heuiseok Lim

In this paper, we focus on improving the quality of the summary generated by neural abstractive dialogue summarization systems.

Abstractive Dialogue Summarization

Paper
Add Code

Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse

no code implementations • 26 Jan 2024 • SeungYoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, Heuiseok Lim

We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative.

Specificity

Paper
Add Code

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

no code implementations • 26 Jan 2024 • Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

To effectively address this, it is imperative to consider both the speech-level, crucial for recognition accuracy, and the text-level, critical for user-friendliness.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

KoBigBird-large: Transformation of Transformer for Korean Language Understanding

no code implementations • 19 Sep 2023 • Kisu Yang, Yoonna Jang, Taewoo Lee, Jinwoo Seong, Hyungjin Lee, Hwanseok Jang, Heuiseok Lim

This work presents KoBigBird-large, a large size of Korean BigBird that achieves state-of-the-art performance and allows long sequence processing for Korean language understanding.

Document Classification Question Answering

Paper
Add Code

Towards Reliable and Fluent Large Language Models: Incorporating Feedback Learning Loops in QA Systems

no code implementations • 8 Sep 2023 • Dongyub Lee, Taesun Whang, Chanhee Lee, Heuiseok Lim

First, we build a dataset to train a critic model capable of evaluating the citation, correctness, and fluency of responses generated by LLMs in QA systems.

Response Generation

Paper
Add Code

Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

no code implementations • 26 Jun 2023 • Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages.

Machine Translation Prompt Engineering +2

Paper
Add Code

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

no code implementations • 26 Jun 2023 • Chanjun Park, Seonmin Koo, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

Data-centric AI approach aims to enhance the model performance without modifying the model and has been shown to impact model performance positively.

Grammatical Error Correction

Paper
Add Code

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning

no code implementations • 26 Jun 2023 • Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim

Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding.

Text Generation

Paper
Add Code

Towards Diverse and Effective Question-Answer Pair Generation from Children Storybooks

1 code implementation • 11 Jun 2023 • Sugyeong Eo, Hyeonseok Moon, Jinsung Kim, Yuna Hur, Jeongwook Kim, Songeun Lee, Changwoo Chun, Sungsoo Park, Heuiseok Lim

In this paper, we propose a QAG framework that enhances QA type diversity by producing different interrogative sentences and implicit/explicit answers.

Paper
Code

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

no code implementations • 20 Mar 2023 • Chanjun Park, Hyeonseok Moon, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim

Leaderboard systems allow researchers to objectively evaluate Natural Language Processing (NLP) models and are typically used to identify models that exhibit superior performance on a given task in a predetermined setting.

Paper
Add Code

You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and Persona

1 code implementation • 6 Jan 2023 • Jungwoo Lim, Myunghoon Kang, Yuna Hur, SeungWon Jung, Jinsung Kim, Yoonna Jang, Dongyub Lee, Hyesung Ji, Donghoon Shin, Seungryong Kim, Heuiseok Lim

The agent selects the proper knowledge and persona to use for generating the answers with our candidate scoring implemented with a poly-encoder.

Hallucination Language Modelling +1

Paper
Code

Analysis of Utterance Embeddings and Clustering Methods Related to Intent Induction for Task-Oriented Dialogue

1 code implementation • 5 Dec 2022 • Jeiyoon Park, Yoonna Jang, Chanhee Lee, Heuiseok Lim

The focus of this work is to investigate unsupervised approaches to overcome quintessential challenges in designing task-oriented dialog schema: assigning intent labels to each dialog turn (intent clustering) and generating a set of intents based on the intent clustering methods (intent induction).

Clustering

Paper
Code

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

no code implementations • COLING 2022 • Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim

With the recent advance in neural machine translation demonstrating its importance, research on quality estimation (QE) has been steadily progressing.

Machine Translation Translation

Paper
Add Code

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

no code implementations • 14 Sep 2022 • Suhyune Son, Chanjun Park, Jungseob Lee, Midan Shim, Chanhee Lee, Yoonna Jang, Jaehyung Seo, Heuiseok Lim

This can be attributed to the fact that the amount of available training data in each language follows the power-law distribution, and most of the languages belong to the long tail of the distribution.

Cross-Lingual Transfer Transfer Learning

Paper
Add Code

Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

1 code implementation • 13 Sep 2022 • Dongsuk Oh, Yejin Kim, Hodong Lee, H. Howie Huang, Heuiseok Lim

Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation.

Contrastive Learning Language Modelling +3

Paper
Code

KoCHET: a Korean Cultural Heritage corpus for Entity-related Tasks

1 code implementation • COLING 2022 • Gyeongmin Kim, Jinsung Kim, Junyoung Son, Heuiseok Lim

As digitized traditional cultural heritage documents have rapidly increased, resulting in an increased need for preservation and management, practical recognition of entities and typification of their classes has become essential.

Entity Typing Management +4

Paper
Code

GRASP: Guiding model with RelAtional Semantics using Prompt for Dialogue Relation Extraction

1 code implementation • COLING 2022 • Junyoung Son, Jinsung Kim, Jungwoo Lim, Heuiseok Lim

To effectively exploit inherent knowledge of PLMs without extra layers and consider scattered semantic cues on the relation between the arguments, we propose a Guiding model with RelAtional Semantics using Prompt (GRASP).

Ranked #2 on Dialog Relation Extraction on DialogRE

Dialog Relation Extraction Emotion Recognition in Conversation +1

Paper
Code

Multimodal Frame-Scoring Transformer for Video Summarization

no code implementations • 5 Jul 2022 • Jeiyoon Park, Kiho Kwoun, Chanhee Lee, Heuiseok Lim

Second, existing datasets for generic video summarization are relatively insufficient to train a caption generator used for extracting text information from a video and to train the multimodal feature extractors.

Video Summarization

Paper
Add Code

There is no rose without a thorn: Finding weaknesses on BlenderBot 2.0 in terms of Model, Data and User-Centric Approach

no code implementations • 10 Jan 2022 • Jungseob Lee, Midan Shim, Suhyune Son, Chanjun Park, Yujin Kim, Heuiseok Lim

BlenderBot 2. 0 is a dialogue model that represents open-domain chatbots by reflecting real-time information and remembering user information for an extended period using an internet search module and multi-session.

Paper
Add Code

Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge

2 code implementations • 16 Dec 2021 • Yoonna Jang, Jungwoo Lim, Yuna Hur, Dongsuk Oh, Suhyune Son, Yeonsoo Lee, Donghoon Shin, Seungryong Kim, Heuiseok Lim

Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to.

Paper
Code

FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue

no code implementations • 8 Dec 2021 • Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim

Paper
Add Code

A Self-Supervised Automatic Post-Editing Data Generation Tool

no code implementations • 24 Nov 2021 • Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim

Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions.

Automatic Post-Editing

Paper
Add Code

A New Tool for Efficiently Generating Quality Estimation Datasets

no code implementations • 1 Nov 2021 • Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim

Building of data for quality estimation (QE) training is expensive and requires significant human labor.

Data Augmentation

Paper
Add Code

Automatic Knowledge Augmentation for Generative Commonsense Reasoning

no code implementations • 30 Oct 2021 • Jaehyung Seo, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

Generative commonsense reasoning is the capability of a language model to generate a sentence with a given concept-set that is based on commonsense knowledge.

Language Modelling Sentence

Paper
Add Code

How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus

no code implementations • 30 Oct 2021 • Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim

This paper proposes a tool for efficiently constructing high-quality parallel corpora with minimizing human labor and making this tool publicly available.

Machine Translation NMT +1

Paper
Add Code

Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC

no code implementations • 28 Oct 2021 • Chanjun Park, Midan Shim, Sugyeong Eo, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim

To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT.

Machine Translation NMT +1

Paper
Add Code

Who speaks like a style of Vitamin: Towards Syntax-Aware DialogueSummarization using Multi-task Learning

no code implementations • 29 Sep 2021 • Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim

To the best of our knowledge, our approach is the first method to apply multi-task learning to the dialogue summarization task.

Abstractive Dialogue Summarization Multi-Task Learning +3

Paper
Add Code

PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

no code implementations • 27 Sep 2021 • Chanjun Park, Yoonna Jang, Seolhwa Lee, Jaehyung Seo, Kisu Yang, Heuiseok Lim

In this study, we propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities.

Paper
Add Code

Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification

no code implementations • NAACL 2021 • Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

We derive an optimal subword tokenization result for Korean-English machine translation by conducting a case study that combines the subword tokenization method, morphological segmentation, and vocabulary method.

Machine Translation Translation

Paper
Add Code

I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning

no code implementations • COLING 2020 • Jungwoo Lim, Dongsuk Oh, Yoonna Jang, Kisu Yang, Heuiseok Lim

CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge.

Paper
Add Code

An Evaluation Protocol for Generative Conversational Systems

no code implementations • 24 Oct 2020 • Seolhwa Lee, Heuiseok Lim, João Sedoc

These findings demonstrate the feasibility of our protocol to evaluate conversational agents and evaluation sets.

Experimental Design

Paper
Add Code

Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog

no code implementations • 31 May 2020 • Jeiyoon Park, Chanhee Lee, Kuekyeng Kim, Heuiseok Lim

Despite its notable success in adversarial learning approaches to multi-domain task-oriented dialog system, training the dialog policy via adversarial inverse reinforcement learning often fails to balance the performance of the policy generator and reward estimator.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multi-View Attention Network for Visual Dialog

1 code implementation • 29 Apr 2020 • Sungjin Park, Taesun Whang, Yeochan Yoon, Heuiseok Lim

To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e. g., question, dialog history, and image) is required.

Ranked #11 on Visual Dialog on VisDial v0.9 val

Visual Dialog

Paper
Code

GREG: A Global Level Relation Extraction with Knowledge Graph Embedding

no code implementations • Appl. Sci. 2020 2020 • Kuekyeng Kim, Yuna Hur, Gyeongmin Kim, Heuiseok Lim

Currently, most relation extraction modules are more focused on the extraction of local mention-level relations—usually from short volumes of text.

Ranked #58 on Relation Extraction on DocRED

Knowledge Graph Embedding Knowledge Graph Embeddings +3

Paper
Add Code

An Effective Domain Adaptive Post-Training Method for BERT in Response Selection

1 code implementation • 13 Aug 2019 • Taesun Whang, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh, Heuiseok Lim

We focus on multi-turn response selection in a retrieval-based dialog system.

Ranked #3 on Conversational Response Selection on RRS Ranking Test

Conversational Response Selection Language Modelling +1

Paper
Code

EmotionX-KU: BERT-Max based Contextual Emotion Classifier

2 code implementations • 27 Jun 2019 • Kisu Yang, Dongyub Lee, Taesun Whang, Seolhwa Lee, Heuiseok Lim

We propose a contextual emotion classifier based on a transferable language model and dynamic max pooling, which predicts the emotion of each utterance in a dialogue.

Emotion Recognition Language Modelling

Paper
Code

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

no code implementations • COLING 2018 • Andrew Matteson, Chanhee Lee, Young-Bum Kim, Heuiseok Lim

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i. e., a dictionary of known morphological transformation forms, or actions).

Morphological Analysis Part-Of-Speech Tagging +1

Paper
Add Code

Character-Level Feature Extraction with Densely Connected Networks

no code implementations • COLING 2018 • Chanhee Lee, Young-Bum Kim, Dongyub Lee, Heuiseok Lim

Generating character-level features is an important step for achieving good results in various natural language processing tasks.

named-entity-recognition Named Entity Recognition +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.