Search Results for author: Yuntian Deng

Found 38 papers, 18 papers with code

Sequence-to-Lattice Models for Fast Translation

no code implementations Findings (EMNLP) 2021 Yuntian Deng, Alexander Rush

Non-autoregressive machine translation (NAT) approaches enable fast generation by utilizing parallelizable generative processes.

Decoder Machine Translation +1

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

no code implementations5 Sep 2024 Yuntian Deng, Wenting Zhao, Jack Hessel, Xiang Ren, Claire Cardie, Yejin Choi

The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions.

Chatbot

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

no code implementations24 Jul 2024 Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about.

Chatbot Hallucination +1

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

1 code implementation7 Jun 2024 Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi

For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs such as GPT-4-turbo.

Benchmarking Chatbot

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

no code implementations3 Jun 2024 Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You

Our benchmarks' advantages lie in (1) a 0. 96 model ranking correlation with Chatbot Arena arising from the highly impartial query distribution and grading mechanism, (2) fast, cheap, and reproducible execution (6% of the time and cost of MMLU), and (3) dynamic evaluation enabled by the rapid and stable data update pipeline.

Chatbot MMLU

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

1 code implementation23 May 2024 Yuntian Deng, Yejin Choi, Stuart Shieber

When leveraging language models for reasoning tasks, generating explicit chain-of-thought (CoT) steps often proves essential for achieving high accuracy in final outputs.

GSM8K

WildChat: 1M ChatGPT Interaction Logs in the Wild

no code implementations2 May 2024 Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng

In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses, alongside request headers.

Chatbot Instruction Following

Implicit Chain of Thought Reasoning via Knowledge Distillation

1 code implementation2 Nov 2023 Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber

In this work, we explore an alternative reasoning approach: instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning.

Knowledge Distillation Math

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

no code implementations6 Oct 2023 Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences.

scientific discovery

Model Criticism for Long-Form Text Generation

1 code implementation16 Oct 2022 Yuntian Deng, Volodymyr Kuleshov, Alexander M. Rush

Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e. g., story progression).

model Text Generation

Markup-to-Image Diffusion Models with Scheduled Sampling

1 code implementation11 Oct 2022 Yuntian Deng, Noriyuki Kojima, Alexander M. Rush

These experiments each verify the effectiveness of the diffusion process and the use of scheduled sampling to fix generation issues.

Denoising Image Generation +2

Semi-Parametric Inducing Point Networks and Neural Processes

2 code implementations24 May 2022 Richa Rastogi, Yair Schiff, Alon Hacohen, Zhaozhi Li, Ian Lee, Yuntian Deng, Mert R. Sabuncu, Volodymyr Kuleshov

We introduce semi-parametric inducing point networks (SPIN), a general-purpose architecture that can query the training set at inference time in a compute-efficient manner.

Imputation Meta-Learning

Low-Rank Constraints for Fast Inference in Structured Models

1 code implementation NeurIPS 2021 Justin T. Chiu, Yuntian Deng, Alexander M. Rush

This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.

Language Modeling Language Modelling +1

Rationales for Sequential Predictions

2 code implementations EMNLP 2021 Keyon Vafa, Yuntian Deng, David M. Blei, Alexander M. Rush

Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales.

Combinatorial Optimization Language Modeling +3

Weighted Gaussian Process Bandits for Non-stationary Environments

no code implementations6 Jul 2021 Yuntian Deng, Xingyu Zhou, Baekjin Kim, Ambuj Tewari, Abhishek Gupta, Ness Shroff

To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression.

regression

Incentive Design and Profit Sharing in Multi-modal Transportation Network

no code implementations9 Jan 2021 Yuntian Deng, Shiping Shao, Archak Mittal, Richard Twumasi-Boakye, James Fishelson, Abhishek Gupta, Ness B. Shroff

Accordingly, in this paper, we use cooperative game theory coupled with the hyperpath-based stochastic user equilibrium framework to study such a market.

Cascaded Text Generation with Markov Transformers

1 code implementation NeurIPS 2020 Yuntian Deng, Alexander M. Rush

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.

Machine Translation Text Generation +1

Residual Energy-Based Models for Text Generation

1 code implementation ICLR 2020 Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.

Language Modeling Language Modelling +3

Residual Energy-Based Models for Text

no code implementations6 Apr 2020 Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

no code implementations29 Sep 2019 Thierry Tambe, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa Reddi, Alexander Rush, David Brooks, Gu-Yeon Wei

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models.

Quantization

Neural Linguistic Steganography

1 code implementation IJCNLP 2019 Zachary M. Ziegler, Yuntian Deng, Alexander M. Rush

Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal.

Language Modeling Language Modelling +1

Latent Alignment and Variational Attention

1 code implementation NeurIPS 2018 Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush

This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference.

Hard Attention Machine Translation +4

OpenNMT: Open-source Toolkit for Neural Machine Translation

no code implementations12 Sep 2017 Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, Alexander M. Rush

We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.

Machine Translation NMT +1

Learning Latent Space Models with Angular Constraints

no code implementations ICML 2017 Pengtao Xie, Yuntian Deng, Yi Zhou, Abhimanu Kumar, Yao-Liang Yu, James Zou, Eric P. Xing

The large model capacity of latent space models (LSMs) enables them to achieve great performance on various applications, but meanwhile renders LSMs to be prone to overfitting.

Diversity

Dropout with Expectation-linear Regularization

no code implementations26 Sep 2016 Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yao-Liang Yu, Yuntian Deng, Eduard Hovy

Algorithmically, we show that our proposed measure of the inference gap can be used to regularize the standard dropout training objective, resulting in an \emph{explicit} control of the gap.

Image Classification

Image-to-Markup Generation with Coarse-to-Fine Attention

14 code implementations ICML 2017 Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.

Decoder Optical Character Recognition (OCR)

Neural Machine Translation with Recurrent Attention Modeling

no code implementations EACL 2017 Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, Alex Smola

Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future.

Machine Translation Translation

Latent Variable Modeling with Diversity-Inducing Mutual Angular Regularization

no code implementations23 Dec 2015 Pengtao Xie, Yuntian Deng, Eric Xing

On two popular latent variable models --- restricted Boltzmann machine and distance metric learning, we demonstrate that MAR can effectively capture long-tail patterns, reduce model complexity without sacrificing expressivity and improve interpretability.

Diversity Metric Learning

On the Generalization Error Bounds of Neural Networks under Diversity-Inducing Mutual Angular Regularization

no code implementations23 Nov 2015 Pengtao Xie, Yuntian Deng, Eric Xing

Recently diversity-inducing regularization methods for latent variable models (LVMs), which encourage the components in LVMs to be diverse, have been studied to address several issues involved in latent variable modeling: (1) how to capture long-tail patterns underlying data; (2) how to reduce model complexity without sacrificing expressivity; (3) how to improve the interpretability of learned patterns.

Diversity

Creating Scalable and Interactive Web Applications Using High Performance Latent Variable Models

no code implementations21 Oct 2015 Aaron Q. Li, Yuntian Deng, Kublai Jing, Joseph W Robinson

In this project we outline a modularized, scalable system for comparing Amazon products in an interactive and informative way using efficient latent variable models and dynamic visualization.

Cannot find the paper you are looking for? You can Submit a new open access paper.