no code implementations • 16 Feb 2025 • Ting-Rui Chiang, Dani Yogatama
For long context modeling, the range of positions may vary a lot, and thus RoPE rotates some dimensions by a great range of angles.
1 code implementation • 7 Nov 2024 • Zhaofeng Wu, Xinyan Velocity Yu, Dani Yogatama, Jiasen Lu, Yoon Kim
We hypothesize that models acquire this capability through learning a shared representation space across heterogeneous data types (e. g., different languages and modalities), which places semantically similar inputs near one another, even if they are from different modalities/languages.
no code implementations • 28 Oct 2024 • Isabelle Lee, Joshua Lum, Ziyi Liu, Dani Yogatama
While interpretability research has shed light on some internal algorithms utilized by transformer-based LLMs, reasoning in natural language, with its deep contextuality and ambiguity, defies easy categorization.
no code implementations • 17 Oct 2024 • Ting-Rui Chiang, Joshua Robinson, Xinyan Velocity Yu, Dani Yogatama
The ability to locate an object in an image according to natural language instructions is crucial for many real-world applications.
1 code implementation • 3 May 2024 • Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei LI, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay
We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models.
no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie
On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.
no code implementations • 1 Apr 2024 • Deqing Fu, Ruohao Guo, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger
Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs.
no code implementations • 16 Feb 2024 • Ting-Rui Chiang, Dani Yogatama
In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classification tasks, and the notion of (3) meaning association.
no code implementations • 4 Feb 2024 • Ollie Liu, Deqing Fu, Dani Yogatama, Willie Neiswanger
The potential of large language models (LLMs) as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty.
no code implementations • 16 Nov 2023 • Ting-Rui Chiang, Xinyan Velocity Yu, Joshua Robinson, Ollie Liu, Isabelle Lee, Dani Yogatama
Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease its perplexity, though the underlying reasons for this remain elusive.
1 code implementation • 25 Oct 2023 • Ting-Rui Chiang, Dani Yogatama
Via a synthetic dataset, our analysis suggests that distributional property indeed leads to the better sample efficiency of pretrained masked language models, but does not fully explain the generalization capability.
1 code implementation • 12 Oct 2023 • Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg
For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image.
no code implementations • 21 Jul 2022 • Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler
There have been a lot of interest in the scaling properties of Transformer models.
1 code implementation • 21 Jun 2022 • Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer
We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.
1 code implementation • 5 May 2022 • Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, Nigel Collier
MAGIC is a flexible framework and is theoretically compatible with any text generation tasks that incorporate image grounding.
1 code implementation • 2 Mar 2022 • Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov
Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots.
2 code implementations • 13 Feb 2022 • Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier
Text generation is of great importance to many natural language processing applications.
no code implementations • 24 Jan 2022 • Qi Liu, Dani Yogatama, Phil Blunsom
We present a memory-augmented approach to condition an autoregressive language model on a knowledge graph.
no code implementations • 12 Oct 2021 • Paul Michel, Sebastian Ruder, Dani Yogatama
When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i. e. the performance on the task with the lowest accuracy).
no code implementations • ACL 2022 • Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith
One way to improve the efficiency is to bound the memory size.
no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
2 code implementations • NeurIPS 2021 • Devendra Singh Sachan, Siva Reddy, William Hamilton, Chris Dyer, Dani Yogatama
We model retrieval decisions as latent variables over sets of relevant documents.
2 code implementations • EMNLP 2021 • Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith
Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune.
Ranked #2 on
Machine Translation
on WMT2017 Chinese-English
no code implementations • ICLR 2021 • Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong
RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism.
Ranked #28 on
Machine Translation
on IWSLT2014 German-English
no code implementations • 4 Feb 2021 • Dani Yogatama, Cyprien de Masson d'Autume, Lingpeng Kong
We present a language model that combines a large parametric neural network (i. e., a transformer) with a non-parametric episodic memory component in an integrated architecture.
1 code implementation • NeurIPS 2021 • Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom
Hence, given the compilation of ever-larger language modelling datasets, combined with the growing list of language-model-based NLP applications that require up-to-date factual knowledge about the world, we argue that now is the right time to rethink the static way in which we currently train and evaluate our language models, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world.
no code implementations • 27 May 2020 • Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom
Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence.
no code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.
no code implementations • 21 Feb 2020 • Kris Cao, Dani Yogatama
We show that our latent task variable model outperforms other sequence-to-sequence baselines on average across tasks in the multitask setting.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Po-Sen Huang, huan zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, Pushmeet Kohli
This paper aims to quantify and reduce a particular type of bias exhibited by language models: bias in the sentiment of generated text.
7 code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama
This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.
no code implementations • ICLR 2020 • Lingpeng Kong, Cyprien de Masson d'Autume, Wang Ling, Lei Yu, Zihang Dai, Dani Yogatama
We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i. e., a sentence).
no code implementations • 25 Sep 2019 • Wang Ling, Chris Dyer, Lei Yu, Lingpeng Kong, Dani Yogatama, Susannah Young
In natural images, transitions between adjacent pixels tend to be smooth and gradual, a fact that has long been exploited in image compression models based on predictive coding.
1 code implementation • IJCNLP 2019 • Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli
Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks.
2 code implementations • NeurIPS 2019 • Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama
We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier.
no code implementations • 31 Jan 2019 • Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom
We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly.
no code implementations • ICLR 2019 • Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama
We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017).
no code implementations • ACL 2018 • Adhiguna Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, Phil Blunsom
Language exhibits hierarchical structure, but recent work using a subject-verb agreement diagnostic argued that state-of-the-art language models, LSTMs, fail to learn long-range syntax sensitive dependencies.
no code implementations • ICLR 2018 • Dani Yogatama, Yishu Miao, Gabor Melis, Wang Ling, Adhiguna Kuncoro, Chris Dyer, Phil Blunsom
We compare and analyze sequential, random access, and stack memory architectures for recurrent neural network language models.
no code implementations • ACL 2017 • Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom
Solving algebraic word problems requires executing a series of arithmetic operations{---}a program{---}to obtain a final answer.
no code implementations • ICLR 2018 • Jean Maillard, Stephen Clark, Dani Yogatama
It can therefore be seen as a tree-based RNN that is unsupervised with respect to the parse trees.
1 code implementation • 11 May 2017 • Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom
Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer.
2 code implementations • 6 Mar 2017 • Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom
We empirically characterize the performance of discriminative and generative LSTM models for text classification.
no code implementations • 28 Nov 2016 • Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling
We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences.
36 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu
We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.
3 code implementations • IJCNLP 2015 • Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, Noah Smith
Current distributed representations of words show little resemblance to theories of lexical semantics.
no code implementations • EMNLP 2015 • Dani Yogatama, Noah A. Smith
When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts.
no code implementations • 8 Jun 2014 • Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah A. Smith
We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings.
no code implementations • TACL 2014 • Dani Yogatama, Chong Wang, Bryan R. Routledge, Noah A. Smith, Eric P. Xing
We present a probabilistic language model that captures temporal dynamics and conditions on arbitrary non-linguistic context features.
no code implementations • 9 Oct 2013 • Dani Yogatama, Bryan R. Routledge, Noah A. Smith
We consider the scenario where the parameters of a probabilistic model are expected to vary over time.