Search Results for author: Kang Min Yoo

Found 33 papers, 18 papers with code

Attribute Injection for Pretrained Language Models: A New Benchmark and an Efficient Method

1 code implementation • COLING 2022 • Reinald Kim Amplayo, Kang Min Yoo, Sang-Woo Lee

Metadata attributes (e. g., user and product IDs from reviews) can be incorporated as additional inputs to neural-based NLP models, by expanding the architecture of the models to improve performance.

Attribute

Paper
Code

Aligning Language Models to Explicitly Handle Ambiguity

no code implementations • 18 Apr 2024 • Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

However, conversational agents built upon even the most recent large language models (LLMs) face challenges in processing ambiguous inputs, primarily due to the following two hurdles: (1) LLMs are not directly trained to handle inputs that are too ambiguous to be properly managed; (2) the degree of ambiguity in an input can vary according to the intrinsic knowledge of the LLMs, which is difficult to investigate.

Question Answering

Paper
Add Code

HyperCLOVA X Technical Report

no code implementations • 2 Apr 2024 • Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han, Youngkyun Jin, Hyein Jun, Jaeseung Jung, Chanwoong Kim, jinhong Kim, Jinuk Kim, Dokyeong Lee, Dongwook Park, Jeong Min Sohn, Sujung Han, Jiae Heo, Sungju Hong, Mina Jeon, Hyunhoon Jung, Jungeun Jung, Wangkyo Jung, Chungjoon Kim, Hyeri Kim, Jonghyun Kim, Min Young Kim, Soeun Lee, Joonhee Park, Jieun Shin, Sojin Yang, Jungsoon Yoon, Hwaran Lee, Sanghwan Bae, Jeehwan Cha, Karl Gylleus, Donghoon Ham, Mihak Hong, Youngki Hong, Yunki Hong, Dahyun Jang, Hyojun Jeon, Yujin Jeon, Yeji Jeong, Myunggeun Ji, Yeguk Jin, Chansong Jo, Shinyoung Joo, Seunghwan Jung, Adrian Jungmyung Kim, Byoung Hoon Kim, Hyomin Kim, Jungwhan Kim, Minkyoung Kim, Minseung Kim, Sungdong Kim, Yonghee Kim, Youngjun Kim, Youngkwan Kim, Donghyeon Ko, Dughyun Lee, Ha Young Lee, Jaehong Lee, Jieun Lee, Jonghyun Lee, Jongjin Lee, Min Young Lee, Yehbin Lee, Taehong Min, Yuri Min, Kiyoon Moon, Hyangnam Oh, Jaesun Park, Kyuyon Park, Younghun Park, Hanbae Seo, Seunghyun Seo, Mihyun Sim, Gyubin Son, Matt Yeo, Kyung Hoon Yeom, Wonjoon Yoo, Myungin You, Doheon Ahn, Homin Ahn, Joohee Ahn, Seongmin Ahn, Chanwoo An, Hyeryun An, Junho An, Sang-Min An, Boram Byun, Eunbin Byun, Jongho Cha, Minji Chang, Seunggyu Chang, Haesong Cho, Youngdo Cho, Dalnim Choi, Daseul Choi, Hyoseok Choi, Minseong Choi, Sangho Choi, Seongjae Choi, Wooyong Choi, Sewhan Chun, Dong Young Go, Chiheon Ham, Danbi Han, Jaemin Han, Moonyoung Hong, Sung Bum Hong, Dong-Hyun Hwang, Seongchan Hwang, Jinbae Im, Hyuk Jin Jang, Jaehyung Jang, Jaeni Jang, Sihyeon Jang, Sungwon Jang, Joonha Jeon, Daun Jeong, JoonHyun Jeong, Kyeongseok Jeong, Mini Jeong, Sol Jin, Hanbyeol Jo, Hanju Jo, Minjung Jo, Chaeyoon Jung, Hyungsik Jung, Jaeuk Jung, Ju Hwan Jung, Kwangsun Jung, Seungjae Jung, Soonwon Ka, Donghan Kang, Soyoung Kang, Taeho Kil, Areum Kim, Beomyoung Kim, Byeongwook Kim, Daehee Kim, Dong-Gyun Kim, Donggook Kim, Donghyun Kim, Euna Kim, Eunchul Kim, Geewook Kim, Gyu Ri Kim, Hanbyul Kim, Heesu Kim, Isaac Kim, Jeonghoon Kim, JiHye Kim, Joonghoon Kim, Minjae Kim, Minsub Kim, Pil Hwan Kim, Sammy Kim, Seokhun Kim, Seonghyeon Kim, Soojin Kim, Soong Kim, Soyoon Kim, Sunyoung Kim, TaeHo Kim, Wonho Kim, Yoonsik Kim, You Jin Kim, Yuri Kim, Beomseok Kwon, Ohsung Kwon, Yoo-Hwan Kwon, Anna Lee, Byungwook Lee, Changho Lee, Daun Lee, Dongjae Lee, Ha-Ram Lee, Hodong Lee, Hwiyeong Lee, Hyunmi Lee, Injae Lee, Jaeung Lee, Jeongsang Lee, Jisoo Lee, JongSoo Lee, Joongjae Lee, Juhan Lee, Jung Hyun Lee, Junghoon Lee, Junwoo Lee, Se Yun Lee, Sujin Lee, Sungjae Lee, Sungwoo Lee, Wonjae Lee, Zoo Hyun Lee, Jong Kun Lim, Kun Lim, Taemin Lim, Nuri Na, Jeongyeon Nam, Kyeong-Min Nam, Yeonseog Noh, Biro Oh, Jung-Sik Oh, Solgil Oh, Yeontaek Oh, Boyoun Park, Cheonbok Park, Dongju Park, Hyeonjin Park, Hyun Tae Park, Hyunjung Park, JiHye Park, Jooseok Park, JungHwan Park, Jungsoo Park, Miru Park, Sang Hee Park, Seunghyun Park, Soyoung Park, Taerim Park, Wonkyeong Park, Hyunjoon Ryu, Jeonghun Ryu, Nahyeon Ryu, Soonshin Seo, Suk Min Seo, Yoonjeong Shim, Kyuyong Shin, Wonkwang Shin, Hyun Sim, Woongseob Sim, Hyejin Soh, Bokyong Son, Hyunjun Son, Seulah Son, Chi-Yun Song, Chiyoung Song, Ka Yeon Song, Minchul Song, Seungmin Song, Jisung Wang, Yonggoo Yeo, Myeong Yeon Yi, Moon Bin Yim, Taehwan Yoo, Youngjoon Yoo, Sungmin Yoon, Young Jin Yoon, Hangyeol Yu, Ui Seon Yu, Xingdong Zuo, Jeongin Bae, Joungeun Bae, Hyunsoo Cho, Seonghyun Cho, Yongjin Cho, Taekyoon Choi, Yera Choi, Jiwan Chung, Zhenghui Han, Byeongho Heo, Euisuk Hong, Taebaek Hwang, Seonyeol Im, Sumin Jegal, Sumin Jeon, Yelim Jeong, Yonghyun Jeong, Can Jiang, Juyong Jiang, Jiho Jin, Ara Jo, Younghyun Jo, Hoyoun Jung, Juyoung Jung, Seunghyeong Kang, Dae Hee Kim, Ginam Kim, Hangyeol Kim, Heeseung Kim, Hyojin Kim, Hyojun Kim, Hyun-Ah Kim, Jeehye Kim, Jin-Hwa Kim, Jiseon Kim, Jonghak Kim, Jung Yoon Kim, Rak Yeong Kim, Seongjin Kim, Seoyoon Kim, Sewon Kim, Sooyoung Kim, Sukyoung Kim, Taeyong Kim, Naeun Ko, Bonseung Koo, Heeyoung Kwak, Haena Kwon, Youngjin Kwon, Boram Lee, Bruce W. Lee, Dagyeong Lee, Erin Lee, Euijin Lee, Ha Gyeong Lee, Hyojin Lee, Hyunjeong Lee, Jeeyoon Lee, Jeonghyun Lee, Jongheok Lee, Joonhyung Lee, Junhyuk Lee, Mingu Lee, Nayeon Lee, Sangkyu Lee, Se Young Lee, Seulgi Lee, Seung Jin Lee, Suhyeon Lee, Yeonjae Lee, Yesol Lee, Youngbeom Lee, Yujin Lee, Shaodong Li, Tianyu Liu, Seong-Eun Moon, Taehong Moon, Max-Lasse Nihlenramstroem, Wonseok Oh, Yuri Oh, Hongbeen Park, Hyekyung Park, Jaeho Park, Nohil Park, Sangjin Park, Jiwon Ryu, Miru Ryu, Simo Ryu, Ahreum Seo, Hee Seo, Kangdeok Seo, Jamin Shin, Seungyoun Shin, Heetae Sin, Jiangping Wang, Lei Wang, Ning Xiang, Longxiang Xiao, Jing Xu, Seonyeong Yi, Haanju Yoo, Haneul Yoo, Hwanhee Yoo, Liang Yu, Youngjae Yu, Weijie Yuan, Bo Zeng, Qian Zhou, Kyunghyun Cho, Jung-Woo Ha, Joonsuk Park, Jihyun Hwang, Hyoung Jo Kwon, Soonyong Kwon, Jungyeon Lee, Seungho Lee, Seonghyeon Lim, Hyunkyung Noh, Seungho Choi, Sang-Woo Lee, Jung Hwa Lim, Nako Sung

We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding.

Instruction Following Machine Translation +1

Paper
Add Code

KMMLU: Measuring Massive Multitask Language Understanding in Korean

no code implementations • 18 Feb 2024 • Guijin Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella Biderman

We propose KMMLU, a new Korean benchmark with 35, 030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM.

Language Modelling Multiple-choice

Paper
Add Code

Aligning Large Language Models by On-Policy Self-Judgment

1 code implementation • 17 Feb 2024 • Sangkyu Lee, Sungdong Kim, Ashkan Yousefpour, Minjoon Seo, Kang Min Yoo, Youngjae Yu

Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning.

Instruction Following

Paper
Code

Unified Speech-Text Pretraining for Spoken Dialog Modeling

no code implementations • 8 Feb 2024 • Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo

While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model

no code implementations • 14 Nov 2023 • Nohil Park, Joonsuk Park, Kang Min Yoo, Sungroh Yoon

An exciting advancement in the field of multilingual models is the emergence of autoregressive models with zero- and few-shot capabilities, a phenomenon widely reported in large-scale language models.

NER POS

Paper
Add Code

Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP

1 code implementation • 23 Oct 2023 • Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs.

Universal Domain Adaptation

Paper
Code

Instruction Tuning with Human Curriculum

1 code implementation • 14 Oct 2023 • Bruce W. Lee, Hyunsoo Cho, Kang Min Yoo

In this work, we (1) introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework that complements our theoretical approach.

Response Generation

4,972

Paper
Code

Aligning Large Language Models through Synthetic Feedback

1 code implementation • 23 May 2023 • Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, Minjoon Seo

In human evaluation, our model is preferred to Alpaca and Dolly-v2, 55. 0% and 58. 5% of the time, respectively.

Language Modelling

Paper
Code

Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

no code implementations • 27 Jan 2023 • Hyunsoo Cho, Choonghyun Park, Junyeop Kim, Hyuhng Joon Kim, Kang Min Yoo, Sang-goo Lee

As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning.

Language Modelling Transfer Learning

Paper
Add Code

Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners

no code implementations • 21 Dec 2022 • Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning.

In-Context Learning Language Modelling

Paper
Add Code

Critic-Guided Decoding for Controlled Text Generation

no code implementations • 21 Dec 2022 • Minbeom Kim, Hwanhee Lee, Kang Min Yoo, Joonsuk Park, Hwaran Lee, Kyomin Jung

In this work, we propose a novel critic decoding method for controlled language generation (CriticControl) that combines the strengths of reinforcement learning and weighted decoding.

Language Modelling reinforcement-learning +2

Paper
Add Code

Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

1 code implementation • 20 Oct 2022 • Hyunsoo Cho, Choonghyun Park, Jaewook Kang, Kang Min Yoo, Taeuk Kim, Sang-goo Lee

Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience.

Contrastive Learning intent-classification +5

Paper
Code

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

no code implementations • 8 Oct 2022 • Se Jung Kwon, Jeonghoon Kim, Jeongin Bae, Kang Min Yoo, Jin-Hwa Kim, Baeseong Park, Byeongwook Kim, Jung-Woo Ha, Nako Sung, Dongsoo Lee

To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task.

Language Modelling Model Compression +1

Paper
Add Code

Continuous Decomposition of Granularity for Neural Paraphrase Generation

1 code implementation • COLING 2022 • Xiaodong Gu, Zhaowei Zhang, Sang-Woo Lee, Kang Min Yoo, Jung-Woo Ha

While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information.

Paraphrase Generation Sentence

Paper
Code

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

no code implementations • 16 Jun 2022 • Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee

Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task.

In-Context Learning text-classification +2

Paper
Add Code

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

1 code implementation • 25 May 2022 • Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, Sang-Woo Lee

Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID).

Ranked #1 on Human Judgment Classification on Pascal-50S

Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref) +5

Paper
Code

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

no code implementations • 25 May 2022 • Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Taeuk Kim

Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive.

In-Context Learning Language Modelling

Paper
Add Code

Generating Information-Seeking Conversations from Unlabeled Documents

1 code implementation • 25 May 2022 • Gangwoo Kim, Sungdong Kim, Kang Min Yoo, Jaewoo Kang

In this paper, we introduce a novel framework, SIMSEEK, (Simulating information-Seeking conversation from unlabeled documents), and compare its two variants.

Conversational Search

Paper
Code

Masked Summarization to Generate Factually Inconsistent Summaries for Improved Factual Consistency Checking

1 code implementation • Findings (NAACL) 2022 • Hwanhee Lee, Kang Min Yoo, Joonsuk Park, Hwaran Lee, Kyomin Jung

To this end, the latest approach is to train a factual consistency classifier on factually consistent and inconsistent summaries.

Abstractive Text Summarization

Paper
Code

Response Generation with Context-Aware Prompt Learning

no code implementations • 4 Nov 2021 • Xiaodong Gu, Kang Min Yoo, Sang-Woo Lee

Pre-trained language models (PLM) have marked a huge leap in neural dialogue modeling.

Dialogue Generation Response Generation

Paper
Add Code

Efficient Attribute Injection for Pretrained Language Models

no code implementations • 16 Sep 2021 • Reinald Kim Amplayo, Kang Min Yoo, Sang-Woo Lee

Metadata attributes (e. g., user and product IDs from reviews) can be incorporated as additional inputs to neural-based NLP models, by modifying the architecture of the models, in order to improve their performance.

Attribute

Paper
Add Code

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

2 code implementations • EMNLP 2021 • Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo, Minsuk Chang, Soobin Suh, Sookyo In, Jinseong Park, Kyungduk Kim, Hiun Kim, Jisu Jeong, Yong Goo Yeo, Donghoon Ham, Dongju Park, Min Young Lee, Jaewook Kang, Inho Kang, Jung-Woo Ha, WooMyoung Park, Nako Sung

GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data.

Few-Shot Learning In-Context Learning +1

996

Paper
Code

Self-Guided Contrastive Learning for BERT Sentence Representations

1 code implementation • ACL 2021 • Taeuk Kim, Kang Min Yoo, Sang-goo Lee

In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations.

Contrastive Learning Data Augmentation +2

Paper
Code

GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation

1 code implementation • Findings (EMNLP) 2021 • Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, Woomyeong Park

Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts.

General Classification Text Augmentation

Paper
Code

Reward Optimization for Neural Machine Translation with Learned Metrics

1 code implementation • 15 Apr 2021 • Raphael Shu, Kang Min Yoo, Jung-Woo Ha

Results show that the reward optimization with BLEURT is able to increase the metric scores by a large margin, in contrast to limited gain when training with smoothed BLEU.

Machine Translation NMT +1

Paper
Code

DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances

1 code implementation • 3 Dec 2020 • Xiaodong Gu, Kang Min Yoo, Jung-Woo Ha

Recent advances in pre-trained language models have significantly improved neural response generation.

Conversational Response Generation Response Generation

Paper
Code

Variational Hierarchical Dialog Autoencoder for Dialog State Tracking Data Augmentation

1 code implementation • EMNLP 2020 • Kang Min Yoo, Hanbit Lee, Franck Dernoncourt, Trung Bui, Walter Chang, Sang-goo Lee

Recent works have shown that generative data augmentation, where synthetic samples generated from deep generative models complement the training dataset, benefit NLP tasks.

Data Augmentation dialog state tracking +4

Paper
Code

Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja

3 code implementations • IJCNLP 2019 • Kang Min Yoo, Taeuk Kim, Sang-goo Lee

We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i. e. Hanja).

Cross-Lingual Transfer Headline Generation +1

273

Paper
Code

Data Augmentation for Spoken Language Understanding via Joint Variational Generation

no code implementations • 7 Sep 2018 • Kang Min Yoo, Youhyun Shin, Sang-goo Lee

Data scarcity is one of the main obstacles of domain adaptation in spoken language understanding (SLU) due to the high cost of creating manually tagged SLU datasets.

Data Augmentation Domain Adaptation +1

Paper
Add Code

Improving Visually Grounded Sentence Representations with Self-Attention

no code implementations • 2 Dec 2017 • Kang Min Yoo, Youhyun Shin, Sang-goo Lee

Sentence representation models trained only on language could potentially suffer from the grounding problem.

Sentence Visual Grounding

Paper
Add Code

Learning to Compose Task-Specific Tree Structures

1 code implementation • 10 Jul 2017 • Jihun Choi, Kang Min Yoo, Sang-goo Lee

For years, recursive neural networks (RvNNs) have been shown to be suitable for representing text into fixed-length vectors and achieved good performance on several natural language processing tasks.

Ranked #62 on Natural Language Inference on SNLI

Natural Language Inference Sentiment Analysis

121

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.