118 code implementations • EMNLP 2014 • Yoon Kim
We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.
Ranked #6 on Emotion Recognition in Conversation on CPED
Emotion Recognition in Conversation General Classification +3
9 code implementations • WS 2018 • Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, Alexander M. Rush
OpenNMT is an open-source toolkit for neural machine translation (NMT).
4 code implementations • ACL 2017 • Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, Alexander M. Rush
We describe an open-source toolkit for neural machine translation (NMT).
6 code implementations • EMNLP 2016 • Yoon Kim, Alexander M. Rush
We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model).
Ranked #1 on Machine Translation on IWSLT2015 Thai-English
14 code implementations • 26 Aug 2015 • Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush
We describe a simple neural language model that relies only on character-level inputs.
2 code implementations • 11 Dec 2023 • Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim
When used as a replacement for the standard attention layer in Transformers, the resulting gated linear attention (GLA) Transformer is found to perform competitively against the LLaMA-architecture Transformer (Touvron et al., 2023) as well recent linear-time-inference baselines such as RetNet(Sun et al., 2023a) and Mamba (Gu & Dao, 2023) on moderate-scale language modeling experiments.
6 code implementations • 13 Jun 2017 • Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann Lecun
This adversarially regularized autoencoder (ARAE) allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce change in the output space.
2 code implementations • 7 Sep 2023 • Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He
Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.
1 code implementation • NeurIPS 2018 • Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush
This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.
Ranked #28 on Machine Translation on IWSLT2014 German-English
2 code implementations • 15 Feb 2024 • Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng
We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.
1 code implementation • NAACL 2022 • Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yih, Yoon Kim, James Glass
We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings.
Ranked #13 on Semantic Textual Similarity on STS16
1 code implementation • NAACL 2022 • Andrew Drozdov, Jiawei Zhou, Radu Florian, Andrew McCallum, Tahira Naseem, Yoon Kim, Ramon Fernandez Astudillo
These alignments are learned separately from parser training and require a complex pipeline of rule-based components, pre-processing, and post-processing to satisfy domain-specific constraints.
1 code implementation • NAACL 2019 • Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis
On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese.
Ranked #8 on Constituency Grammar Induction on PTB Diagnostic ECG Database (Max F1 (WSJ) metric)
1 code implementation • ICML 2018 • Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. Rush
Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network.
Ranked #2 on Text Generation on Yahoo Questions
2 code implementations • ACL 2019 • Yoon Kim, Chris Dyer, Alexander M. Rush
We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar.
1 code implementation • 20 Nov 2023 • Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim
Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component.
1 code implementation • 6 Mar 2024 • Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag
We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level.
2 code implementations • ACL 2021 • Demi Guo, Alexander M. Rush, Yoon Kim
This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks.
1 code implementation • NeurIPS 2021 • Yoon Kim
While flexible and performant, these models often require large datasets for training and can fail spectacularly on benchmarks designed to test for compositional generalization.
1 code implementation • NeurIPS 2023 • Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi
We propose GRACE, a lifelong model editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs.
1 code implementation • 23 Oct 2023 • Wei Liu, Songlin Yang, Yoon Kim, Kewei Tu
Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing.
1 code implementation • NeurIPS 2023 • Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, Yoon Kim
Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples.
1 code implementation • 23 Jan 2024 • Ekin Akyürek, Bailin Wang, Yoon Kim, Jacob Andreas
Finally, we show that hard-wiring these heads into neural models improves performance not just on ICLL, but natural language modeling -- improving the perplexity of 340M-parameter models by up to 1. 14 points (6. 7%) on the SlimPajama dataset.
1 code implementation • 5 Jul 2023 • Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, Yoon Kim
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills.
1 code implementation • EMNLP 2017 • Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber
In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches.
1 code implementation • CVPR 2022 • Yi Li, Rameswar Panda, Yoon Kim, Chun-Fu, Chen, Rogerio Feris, David Cox, Nuno Vasconcelos
In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation.
1 code implementation • 26 May 2023 • Jiaxin Ge, Hongyin Luo, Yoon Kim, James Glass
Experiments on binary and multi-class classification tasks show that SimPLE leads to more robust self-training results, indicating that the self-trained entailment models are more efficient and trustworthy than large language models on language understanding tasks.
Multi-class Classification Natural Language Understanding +1
1 code implementation • 19 Sep 2023 • Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning?
1 code implementation • 13 Nov 2023 • Zilu Tang, Mayank Agarwal, Alex Shypula, Bailin Wang, Derry Wijaya, Jie Chen, Yoon Kim
This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models.
1 code implementation • EMNLP 2020 • Demi Guo, Yoon Kim, Alexander M. Rush
Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language.
1 code implementation • 12 Oct 2023 • Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence.
1 code implementation • 2 Feb 2022 • Hunter Lang, Monica Agrawal, Yoon Kim, David Sontag
We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data.
1 code implementation • 15 Nov 2022 • Bailin Wang, Ivan Titov, Jacob Andreas, Yoon Kim
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
1 code implementation • Findings (ACL) 2022 • Jiabao Ji, Yoon Kim, James Glass, Tianxing He
This work aims to develop a control mechanism by which a user can select spans of context as "highlights" for the model to focus on, and generate relevant output.
1 code implementation • 18 Dec 2022 • Songlin Yang, Roger P. Levy, Yoon Kim
We study grammar induction with mildly context-sensitive grammars for unsupervised discontinuous parsing.
1 code implementation • 8 Feb 2023 • Han Guo, Philip Greengard, Hongyi Wang, Andrew Gelman, Yoon Kim, Eric P. Xing
A recent alternative formulation instead treats federated learning as a distributed inference problem, where the goal is to infer a global posterior from partitioned client data (Al-Shedivat et al., 2021).
1 code implementation • NeurIPS 2019 • Sam Wiseman, Yoon Kim
We propose to learn deep undirected graphical models (i. e., MRFs) with a non-ELBO objective for which we can calculate exact gradients.
1 code implementation • ICML 2020 • Jonathan Mamou, Hang Le, Miguel Del Rio, Cory Stephenson, Hanlin Tang, Yoon Kim, SueYeon Chung
In addition, we find that the emergence of linear separability in these manifolds is driven by a combined reduction of manifolds' radius, dimensionality and inter-manifold correlations.
1 code implementation • WS 2014 • Yoon Kim, Yi-I Chiu, Kentaro Hanaki, Darshan Hegde, Slav Petrov
We provide a method for automatically detecting change in language across time through a chronologically trained neural language model.
1 code implementation • ICML 2020 • Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag
One of the most surprising and exciting discoveries in supervised learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).
1 code implementation • 4 Apr 2024 • Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim
We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult.
1 code implementation • 17 Nov 2022 • Tiwalayo Eisape, Vineet Gangireddy, Roger P. Levy, Yoon Kim
This suggests implicit incremental syntactic inferences underlie next-word predictions in autoregressive neural language models.
1 code implementation • 24 May 2023 • Lucas Torroba Hennigen, Yoon Kim
Masked language models (MLM) do not explicitly define a distribution over language, i. e., they are not language models per se.
1 code implementation • 13 Feb 2024 • Kyle O'Brien, Nathan Ng, Isha Puri, Jorge Mendez, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi, Thomas Hartvigsen
Most techniques for improving OOD robustness are not applicable to settings where the model is effectively a black box, such as when the weights are frozen, retraining is costly, or the model is leveraged via an API.
1 code implementation • 26 Feb 2024 • Hang Jiang, Xiajie Zhang, Robert Mahari, Daniel Kessler, Eric Ma, Tal August, Irene Li, Alex 'Sandy' Pentland, Yoon Kim, Jad Kabbara, Deb Roy
Finally, we find that learning with stories shows a higher retention rate for non-native speakers in the follow-up assessment.
1 code implementation • 17 Apr 2024 • Yue Zhou, Yada Zhu, Diego Antognini, Yoon Kim, Yang Zhang
This paper studies the relationship between the surface form of a mathematical problem and its solvability by large language models.
no code implementations • 3 Jun 2018 • Gabriel Grand, Aron Szanto, Yoon Kim, Alexander Rush
Visual question answering (VQA) models respond to open-ended natural language questions about images.
no code implementations • 12 Sep 2017 • Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, Alexander M. Rush
We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.
no code implementations • 3 Feb 2017 • Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush
Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network.
no code implementations • WS 2016 • Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber
We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016.
no code implementations • WS 2014 • Yoon Kim, Owen Zhang
We provide a simple but novel supervised weighting scheme for adjusting term frequency in tf-idf for sentiment analysis and text classification.
no code implementations • 12 Jul 2018 • Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei
VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful.
no code implementations • 17 Dec 2018 • Yoon Kim, Sam Wiseman, Alexander M. Rush
There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning.
no code implementations • 1 Jan 2021 • Matteo Alleman, Jonathan Mamou, Miguel A Del Rio, Hanlin Tang, Yoon Kim, SueYeon Chung
Importing from computational and cognitive neuroscience the notion of representational invariance, we perform a series of probes designed to test the sensitivity of Transformer representations to several kinds of structure in sentences.
no code implementations • ACL (RepL4NLP) 2021 • Matteo Alleman, Jonathan Mamou, Miguel A Del Rio, Hanlin Tang, Yoon Kim, SueYeon Chung
While vector-based language representations from pretrained language models have set a new standard for many NLP tasks, there is not yet a complete accounting of their inner workings.
no code implementations • 13 Jul 2021 • Stanislav Lukyanenko, Won-Dong Jang, Donglai Wei, Robbert Struyven, Yoon Kim, Brian Leahy, Helen Yang, Alexander Rush, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister
In this work, we propose a two-stream model for developmental stage classification.
no code implementations • 25 May 2022 • Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, David Sontag
A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes.
no code implementations • 2 Mar 2023 • Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim
Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis.
no code implementations • 6 Mar 2023 • Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim
Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks.
no code implementations • 24 May 2023 • Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass
Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.
no code implementations • 15 Jun 2023 • Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori
We curate a comprehensive dataset of 4, 550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree.
no code implementations • 11 Oct 2023 • Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass
We study phrase structure induction from visually-grounded speech.
no code implementations • 11 Oct 2023 • Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings.
no code implementations • 15 Nov 2023 • Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim
LLMs are vulnerable to hallucinations, and thus their outputs generally require laborious human verification for high-stakes applications.
no code implementations • 13 Jan 2024 • Yujun Mao, Yoon Kim, Yilun Zhou
And while self-generated verbalizations of intermediate reasoning steps (i. e., chain-of-thought prompting) have been shown to be helpful, whether LLMs can make use of helpful side information such as problem-specific hints has not been investigated before.
no code implementations • 19 Jan 2024 • Mayank Agarwal, Yikang Shen, Bailin Wang, Yoon Kim, Jie Chen
In this work, we explore data-efficient adaptation of pre-trained code models by further pre-training and fine-tuning them with program structures.
no code implementations • 4 Feb 2024 • Peiqi Wang, Yikang Shen, Zhen Guo, Matthew Stallone, Yoon Kim, Polina Golland, Rameswar Panda
Our experiments demonstrate that the proposed diversity measure in the normalized weight gradient space is correlated with downstream instruction-following performance.
no code implementations • 21 Feb 2024 • William Merrill, Zhaofeng Wu, Norihito Naka, Yoon Kim, Tal Linzen
Do LMs infer the semantics of text from co-occurrence patterns in their training data?
no code implementations • 26 Feb 2024 • Jerry Ngo, Yoon Kim
This probe is trained via a contrastive loss that pushes the language representations and sound representations of an object to be close to one another.
no code implementations • 17 Mar 2024 • Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency
We describe an approach for aligning an LLM-based dialogue agent based on global (i. e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals.
no code implementations • 18 Apr 2024 • Zhaofeng Wu, Ananth Balashankar, Yoon Kim, Jacob Eisenstein, Ahmad Beirami
In this work, we evaluate a simple approach for zero-shot cross-lingual alignment, where a reward model is trained on preference data in one source language and directly applied to other target languages.