Search Results for author: Yoon Kim

Found 68 papers, 44 papers with code

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

no code implementations17 Mar 2024 Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency

We describe an approach for aligning an LLM-based dialogue agent based on global (i. e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals.

Learning to Decode Collaboratively with Multiple Language Models

1 code implementation6 Mar 2024 Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag

We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level.

Instruction Following

What Do Language Models Hear? Probing for Auditory Representations in Language Models

no code implementations26 Feb 2024 Jerry Ngo, Yoon Kim

This probe is trained via a contrastive loss that pushes the language representations and sound representations of an object to be close to one another.

Object

Data Engineering for Scaling Language Models to 128K Context

2 code implementations15 Feb 2024 Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

Continual Pretraining

Improving Black-box Robustness with In-Context Rewriting

1 code implementation13 Feb 2024 Kyle O'Brien, Nathan Ng, Isha Puri, Jorge Mendez, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi, Thomas Hartvigsen

Most techniques for improving OOD robustness are not applicable to settings where the model is effectively a black box, such as when the weights are frozen, retraining is costly, or the model is leveraged via an API.

News Classification

Diversity Measurement and Subset Selection for Instruction Tuning Datasets

no code implementations4 Feb 2024 Peiqi Wang, Yikang Shen, Zhen Guo, Matthew Stallone, Yoon Kim, Polina Golland, Rameswar Panda

Our experiments demonstrate that the proposed diversity measure in the normalized weight gradient space is correlated with downstream instruction-following performance.

Instruction Following Point Processes

In-Context Language Learning: Architectures and Algorithms

1 code implementation23 Jan 2024 Ekin Akyürek, Bailin Wang, Yoon Kim, Jacob Andreas

Finally, we show that hard-wiring these heads into neural models improves performance not just on ICLL, but natural language modeling -- improving the perplexity of 340M-parameter models by up to 1. 14 points (6. 7%) on the SlimPajama dataset.

In-Context Learning Language Modelling

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

no code implementations19 Jan 2024 Mayank Agarwal, Yikang Shen, Bailin Wang, Yoon Kim, Jie Chen

In this work, we explore data-efficient adaptation of pre-trained code models by further pre-training and fine-tuning them with program structures.

CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities

no code implementations13 Jan 2024 Yujun Mao, Yoon Kim, Yilun Zhou

And while self-generated verbalizations of intermediate reasoning steps (i. e., chain-of-thought prompting) have been shown to be helpful, whether LLMs can make use of helpful side information such as problem-specific hints has not been investigated before.

Math Mathematical Reasoning

Gated Linear Attention Transformers with Hardware-Efficient Training

2 code implementations11 Dec 2023 Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim

When used as a replacement for the standard attention layer in Transformers, the resulting gated linear attention (GLA) Transformer is found to perform competitively against the LLaMA-architecture Transformer (Touvron et al., 2023) as well recent linear-time-inference baselines such as RetNet(Sun et al., 2023a) and Mamba (Gu & Dao, 2023) on moderate-scale language modeling experiments.

Language Modelling

LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

1 code implementation20 Nov 2023 Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim

Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component.

Language Modelling Model Compression +1

Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

1 code implementation13 Nov 2023 Zilu Tang, Mayank Agarwal, Alex Shypula, Bailin Wang, Derry Wijaya, Jie Chen, Yoon Kim

This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models.

Code Translation Translation

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

1 code implementation23 Oct 2023 Wei Liu, Songlin Yang, Yoon Kim, Kewei Tu

Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing.

Constituency Grammar Induction Language Modelling

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

1 code implementation12 Oct 2023 Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren

The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence.

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

2 code implementations7 Sep 2023 Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

no code implementations15 Jun 2023 Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori

We curate a comprehensive dataset of 4, 550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree.

Electrical Engineering Few-Shot Learning +3

Entailment as Robust Self-Learner

1 code implementation26 May 2023 Jiaxin Ge, Hongyin Luo, Yoon Kim, James Glass

Experiments on binary and multi-class classification tasks show that SimPLE leads to more robust self-training results, indicating that the self-trained entailment models are more efficient and trustworthy than large language models on language understanding tasks.

Multi-class Classification Natural Language Understanding +1

Deriving Language Models from Masked Language Models

1 code implementation24 May 2023 Lucas Torroba Hennigen, Yoon Kim

Masked language models (MLM) do not explicitly define a distribution over language, i. e., they are not language models per se.

SAIL: Search-Augmented Instruction Learning

no code implementations24 May 2023 Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.

Denoising Fact Checking +3

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

no code implementations6 Mar 2023 Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks.

Transfer Learning

Learning to Grow Pretrained Models for Efficient Transformer Training

no code implementations2 Mar 2023 Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim

Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis.

Federated Learning as Variational Inference: A Scalable Expectation Propagation Approach

1 code implementation8 Feb 2023 Han Guo, Philip Greengard, Hongyi Wang, Andrew Gelman, Yoon Kim, Eric P. Xing

A recent alternative formulation instead treats federated learning as a distributed inference problem, where the goal is to infer a global posterior from partitioned client data (Al-Shedivat et al., 2021).

Distributed Optimization Federated Learning +1

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors

1 code implementation NeurIPS 2023 Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi

We propose GRACE, a lifelong model editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs.

Model Editing World Knowledge

Probing for Incremental Parse States in Autoregressive Language Models

1 code implementation17 Nov 2022 Tiwalayo Eisape, Vineet Gangireddy, Roger P. Levy, Yoon Kim

This suggests implicit incremental syntactic inferences underlie next-word predictions in autoregressive neural language models.

Sentence

Hierarchical Phrase-based Sequence-to-Sequence Learning

1 code implementation15 Nov 2022 Bailin Wang, Ivan Titov, Jacob Andreas, Yoon Kim

We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.

Inductive Bias Machine Translation +2

VALHALLA: Visual Hallucination for Machine Translation

1 code implementation CVPR 2022 Yi Li, Rameswar Panda, Yoon Kim, Chun-Fu, Chen, Rogerio Feris, David Cox, Nuno Vasconcelos

In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation.

Hallucination Multimodal Machine Translation +2

Large Language Models are Few-Shot Clinical Information Extractors

no code implementations25 May 2022 Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, David Sontag

A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes.

Benchmarking coreference-resolution +4

Inducing and Using Alignments for Transition-based AMR Parsing

1 code implementation NAACL 2022 Andrew Drozdov, Jiawei Zhou, Radu Florian, Andrew McCallum, Tahira Naseem, Yoon Kim, Ramon Fernandez Astudillo

These alignments are learned separately from parser training and require a complex pipeline of rule-based components, pre-processing, and post-processing to satisfy domain-specific constraints.

AMR Parsing

Controlling the Focus of Pretrained Language Generation Models

1 code implementation Findings (ACL) 2022 Jiabao Ji, Yoon Kim, James Glass, Tianxing He

This work aims to develop a control mechanism by which a user can select spans of context as "highlights" for the model to focus on, and generate relevant output.

Abstractive Text Summarization Response Generation +1

Co-training Improves Prompt-based Learning for Large Language Models

1 code implementation2 Feb 2022 Hunter Lang, Monica Agrawal, Yoon Kim, David Sontag

We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data.

Zero-Shot Learning

Sequence-to-Sequence Learning with Latent Neural Grammars

1 code implementation NeurIPS 2021 Yoon Kim

While flexible and performant, these models often require large datasets for training and can fail spectacularly on benchmarks designed to test for compositional generalization.

Feature Engineering Machine Translation +2

Syntactic Perturbations Reveal Representational Correlates of Hierarchical Phrase Structure in Pretrained Language Models

no code implementations ACL (RepL4NLP) 2021 Matteo Alleman, Jonathan Mamou, Miguel A Del Rio, Hanlin Tang, Yoon Kim, SueYeon Chung

While vector-based language representations from pretrained language models have set a new standard for many NLP tasks, there is not yet a complete accounting of their inner workings.

Sentence

Representational correlates of hierarchical phrase structure in deep language models

no code implementations1 Jan 2021 Matteo Alleman, Jonathan Mamou, Miguel A Del Rio, Hanlin Tang, Yoon Kim, SueYeon Chung

Importing from computational and cognitive neuroscience the notion of representational invariance, we perform a series of probes designed to test the sensitivity of Transformer representations to several kinds of structure in sentences.

Sentence

Parameter-Efficient Transfer Learning with Diff Pruning

2 code implementations ACL 2021 Demi Guo, Alexander M. Rush, Yoon Kim

This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks.

Transfer Learning

Sequence-Level Mixed Sample Data Augmentation

1 code implementation EMNLP 2020 Demi Guo, Yoon Kim, Alexander M. Rush

Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language.

Data Augmentation Semantic Parsing +1

Emergence of Separable Manifolds in Deep Language Representations

1 code implementation ICML 2020 Jonathan Mamou, Hang Le, Miguel Del Rio, Cory Stephenson, Hanlin Tang, Yoon Kim, SueYeon Chung

In addition, we find that the emergence of linear separability in these manifolds is driven by a combined reduction of manifolds' radius, dimensionality and inter-manifold correlations.

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models

1 code implementation ICML 2020 Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag

One of the most surprising and exciting discoveries in supervised learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).

Variational Inference

Compound Probabilistic Context-Free Grammars for Grammar Induction

2 code implementations ACL 2019 Yoon Kim, Chris Dyer, Alexander M. Rush

We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar.

Constituency Grammar Induction Sentence +1

Amortized Bethe Free Energy Minimization for Learning MRFs

1 code implementation NeurIPS 2019 Sam Wiseman, Yoon Kim

We propose to learn deep undirected graphical models (i. e., MRFs) with a non-ELBO objective for which we can calculate exact gradients.

A Tutorial on Deep Latent Variable Models of Natural Language

no code implementations17 Dec 2018 Yoon Kim, Sam Wiseman, Alexander M. Rush

There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning.

Variational Inference

Avoiding Latent Variable Collapse With Generative Skip Models

no code implementations12 Jul 2018 Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei

VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful.

Latent Alignment and Variational Attention

1 code implementation NeurIPS 2018 Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush

This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.

Hard Attention Machine Translation +4

OpenNMT: Open-source Toolkit for Neural Machine Translation

no code implementations12 Sep 2017 Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, Alexander M. Rush

We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.

Machine Translation NMT +1

Adapting Sequence Models for Sentence Correction

1 code implementation EMNLP 2017 Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches.

Machine Translation Sentence +1

Adversarially Regularized Autoencoders

6 code implementations13 Jun 2017 Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann Lecun

This adversarially regularized autoencoder (ARAE) allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce change in the output space.

Representation Learning Style Transfer

Structured Attention Networks

no code implementations3 Feb 2017 Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush

Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network.

Machine Translation Natural Language Inference +2

Sequence-Level Knowledge Distillation

6 code implementations EMNLP 2016 Yoon Kim, Alexander M. Rush

We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).

Knowledge Distillation Machine Translation +2

Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction

no code implementations WS 2016 Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016.

Sentence

Character-Aware Neural Language Models

14 code implementations26 Aug 2015 Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush

We describe a simple neural language model that relies only on character-level inputs.

Language Modelling

Convolutional Neural Networks for Sentence Classification

118 code implementations EMNLP 2014 Yoon Kim

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.

Emotion Recognition in Conversation General Classification +3

Temporal Analysis of Language through Neural Language Models

1 code implementation WS 2014 Yoon Kim, Yi-I Chiu, Kentaro Hanaki, Darshan Hegde, Slav Petrov

We provide a method for automatically detecting change in language across time through a chronologically trained neural language model.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.