Search Results for author: Dani Yogatama

Found 51 papers, 17 papers with code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

35 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

Ranked #1 on Accented Speech Recognition on VoxForge American-Canadian

Accented Speech Recognition Noisy Speech Recognition

76,579

Paper
Code

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

32,753

Paper
Code

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

1 code implementation • NeurIPS 2021 • Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

Hence, given the compilation of ever-larger language modelling datasets, combined with the growing list of language-model-based NLP applications that require up-to-date factual knowledge about the world, we argue that now is the right time to rethink the static way in which we currently train and evaluate our language models, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world.

Language Modelling

12,780

Paper
Code

A Contrastive Framework for Neural Text Generation

2 code implementations • 13 Feb 2022 • Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier

Text generation is of great importance to many natural language processing applications.

Text Generation

444

Paper
Code

Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

1 code implementation • 11 May 2017 • Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer.

Program induction

266

Paper
Code

Language Models Can See: Plugging Visual Controls in Text Generation

1 code implementation • 5 May 2022 • Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, Nigel Collier

MAGIC is a flexible framework and is theoretically compatible with any text generation tasks that incorporate image grounding.

Image Captioning Image-text matching +3

250

Paper
Code

Generative and Discriminative Text Classification with Recurrent Neural Networks

2 code implementations • 6 Mar 2017 • Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom

We empirically characterize the performance of discriminative and generative LSTM models for text classification.

Continual Learning General Classification +2

209

Paper
Code

On the Cross-lingual Transferability of Monolingual Representations

6 code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama

This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.

Cross-Lingual Question Answering Language Modelling +1

176

Paper
Code

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation

1 code implementation • IJCNLP 2019 • Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli

Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks.

Data Augmentation text-classification +1

148

Paper
Code

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

2 code implementations • NeurIPS 2021 • Devendra Singh Sachan, Siva Reddy, William Hamilton, Chris Dyer, Dani Yogatama

We model retrieval decisions as latent variables over sets of relevant documents.

Ranked #1 on Open-Domain Question Answering on Natural Questions (short)

Answer Generation Open-Domain Question Answering +1

106

Paper
Code

High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning

1 code implementation • 2 Mar 2022 • Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov

Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots.

Representation Learning Time Series Analysis +2

Paper
Code

Questions Are All You Need to Train a Dense Passage Retriever

1 code implementation • 21 Jun 2022 • Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.

Denoising Language Modelling +1

Paper
Code

Episodic Memory in Lifelong Language Learning

2 code implementations • NeurIPS 2019 • Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama

We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier.

Continual Learning General Classification +3

Paper
Code

Sparse Overcomplete Word Vector Representations

3 code implementations • IJCNLP 2015 • Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, Noah Smith

Current distributed representations of words show little resemblance to theories of lexical semantics.

Paper
Code

Interpretable Diffusion via Information Decomposition

1 code implementation • 12 Oct 2023 • Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg

For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image.

Image Generation Vision-Language Segmentation

Paper
Code

Finetuning Pretrained Transformers into RNNs

1 code implementation • EMNLP 2021 • Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith

Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune.

Ranked #2 on Machine Translation on WMT2017 Chinese-English

Language Modelling Machine Translation +1

Paper
Code

Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs

no code implementations • ICLR 2018 • Jean Maillard, Stephen Clark, Dani Yogatama

It can therefore be seen as a tree-based RNN that is unsupervised with respect to the parse trees.

Natural Language Inference Reverse Dictionary +2

Paper
Add Code

Learning to Compose Words into Sentences with Reinforcement Learning

no code implementations • 28 Nov 2016 • Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling

We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Sparse and Adaptive Prior for Time-Dependent Model Parameters

no code implementations • 9 Oct 2013 • Dani Yogatama, Bryan R. Routledge, Noah A. Smith

We consider the scenario where the parameters of a probabilistic model are expected to vary over time.

Variational Inference

Paper
Add Code

Bayesian Optimization of Text Representations

no code implementations • EMNLP 2015 • Dani Yogatama, Noah A. Smith

When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts.

Bayesian Optimization General Classification +2

Paper
Add Code

Learning Word Representations with Hierarchical Sparse Coding

no code implementations • 8 Jun 2014 • Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah A. Smith

We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings.

Sentence Sentence Completion +2

Paper
Add Code

LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better

no code implementations • ACL 2018 • Adhiguna Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, Phil Blunsom

Language exhibits hierarchical structure, but recent work using a subject-verb agreement diagnostic argued that state-of-the-art language models, LSTMs, fail to learn long-range syntax sensitive dependencies.

Language Modelling Machine Translation +1

Paper
Add Code

Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems

no code implementations • ACL 2017 • Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

Solving algebraic word problems requires executing a series of arithmetic operations{---}a program{---}to obtain a final answer.

Decision Making Program induction

Paper
Add Code

Memory Architectures in Recurrent Neural Network Language Models

no code implementations • ICLR 2018 • Dani Yogatama, Yishu Miao, Gabor Melis, Wang Ling, Adhiguna Kuncoro, Chris Dyer, Phil Blunsom

We compare and analyze sequential, random access, and stack memory architectures for recurrent neural network language models.

Paper
Add Code

Variational Smoothing in Recurrent Neural Network Language Models

no code implementations • ICLR 2019 • Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama

We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017).

Language Modelling

Paper
Add Code

Learning and Evaluating General Linguistic Intelligence

no code implementations • 31 Jan 2019 • Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom

We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly.

Natural Language Understanding Question Answering

Paper
Add Code

Dynamic Language Models for Streaming Text

no code implementations • TACL 2014 • Dani Yogatama, Chong Wang, Bryan R. Routledge, Noah A. Smith, Eric P. Xing

We present a probabilistic language model that captures temporal dynamics and conditions on arbitrary non-linguistic context features.

Language Modelling Machine Translation +1

Paper
Add Code

Embedding Methods for Fine Grained Entity Type Classification

no code implementations • IJCNLP 2015 • Dani Yogatama, Daniel Gillick, Nevena Lazic

Classification Coreference Resolution +8

Paper
Add Code

Linguistic Structured Sparsity in Text Categorization

no code implementations • ACL 2014 • Dani Yogatama, Noah A. Smith

Feature Engineering Language Modelling +4

Paper
Add Code

A Probabilistic Model for Canonicalizing Named Entity Mentions

no code implementations • ACL 2012 • Dani Yogatama, Yanchuan Sim, Noah A. Smith

Bayesian Inference Coreference Resolution

Paper
Add Code

Extractive Summarization by Maximizing Semantic Volume

no code implementations • EMNLP 2015 • Dani Yogatama, Fei Liu, Noah A. Smith

Dictionary Learning Extractive Summarization +1

Paper
Add Code

A Mutual Information Maximization Perspective of Language Representation Learning

no code implementations • ICLR 2020 • Lingpeng Kong, Cyprien de Masson d'Autume, Wang Ling, Lei Yu, Zihang Dai, Dani Yogatama

We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i. e., a sentence).

Representation Learning Sentence

Paper
Add Code

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Po-Sen Huang, huan zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, Pushmeet Kohli

This paper aims to quantify and reduce a particular type of bias exhibited by language models: bias in the sentiment of generated text.

counterfactual Fairness +4

Paper
Add Code

Modelling Latent Skills for Multitask Language Generation

no code implementations • 21 Feb 2020 • Kris Cao, Dani Yogatama

We show that our latent task variable model outperforms other sequence-to-sequence baselines on average across tasks in the multitask setting.

Few-Shot Learning Text Generation

Paper
Add Code

A Call for More Rigor in Unsupervised Cross-lingual Learning

no code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.

Cross-Lingual Word Embeddings Position +3

Paper
Add Code

Syntactic Structure Distillation Pretraining For Bidirectional Encoders

no code implementations • 27 May 2020 • Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence.

Knowledge Distillation Language Modelling +3

Paper
Add Code

Adaptive Semiparametric Language Models

no code implementations • 4 Feb 2021 • Dani Yogatama, Cyprien de Masson d'Autume, Lingpeng Kong

We present a language model that combines a large parametric neural network (i. e., a transformer) with a non-parametric episodic memory component in an integrated architecture.

Language Modelling

Paper
Add Code

Random Feature Attention

no code implementations • ICLR 2021 • Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong

RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism.

Ranked #27 on Machine Translation on IWSLT2014 German-English

Language Modelling Machine Translation +3

Paper
Add Code

ABC: Attention with Bounded-memory Control

no code implementations • ACL 2022 • Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith

One way to improve the efficiency is to bound the memory size.

Language Modelling Machine Translation

Paper
Add Code

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

Paper
Add Code

Balancing Average and Worst-case Accuracy in Multitask Learning

no code implementations • 12 Oct 2021 • Paul Michel, Sebastian Ruder, Dani Yogatama

When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i. e. the performance on the task with the lowest accuracy).

Image Classification Language Modelling

Paper
Add Code

Relative Pixel Prediction For Autoregressive Image Generation

no code implementations • 25 Sep 2019 • Wang Ling, Chris Dyer, Lei Yu, Lingpeng Kong, Dani Yogatama, Susannah Young

In natural images, transitions between adjacent pixels tend to be smooth and gradual, a fact that has long been exploited in image compression models based on predictive coding.

Colorization Image Colorization +4

Paper
Add Code

Relational Memory Augmented Language Models

no code implementations • 24 Jan 2022 • Qi Liu, Dani Yogatama, Phil Blunsom

We present a memory-augmented approach to condition an autoregressive language model on a knowledge graph.

Language Modelling Text Generation

Paper
Add Code

Emergent Abilities of Large Language Models

no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.

Language Modelling

Paper
Add Code

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

no code implementations • 21 Jul 2022 • Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

There have been a lot of interest in the scaling properties of Transformer models.

Inductive Bias

Paper
Add Code

The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining

1 code implementation • 25 Oct 2023 • Ting-Rui Chiang, Dani Yogatama

Via a synthetic dataset, our analysis suggests that distributional property indeed leads to the better sample efficiency of pretrained masked language models, but does not fully explain the generalization capability.

Language Modelling Masked Language Modeling +2

Paper
Code

On Retrieval Augmentation and the Limitations of Language Model Training

no code implementations • 16 Nov 2023 • Ting-Rui Chiang, Xinyan Velocity Yu, Joshua Robinson, Ollie Liu, Isabelle Lee, Dani Yogatama

Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease its perplexity, though the underlying reasons for this remain elusive.

Language Modelling Memorization +1

Paper
Add Code

DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models

no code implementations • 4 Feb 2024 • Ollie Liu, Deqing Fu, Dani Yogatama, Willie Neiswanger

Large language models (LLMs) are increasingly used across society, including in domains like business, engineering, and medicine.

Decision Making Decision Making Under Uncertainty +2

Paper
Add Code

Understanding In-Context Learning with a Pelican Soup Framework

no code implementations • 16 Feb 2024 • Ting-Rui Chiang, Dani Yogatama

In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classification tasks, and the notion of (3) meaning association.

Common Sense Reasoning In-Context Learning +1

Paper
Add Code

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

no code implementations • 1 Apr 2024 • Deqing Fu, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger

Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs.

Benchmarking Math

Paper
Add Code

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie

On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.