no code implementations • 15 Mar 2024 • Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Adhiguna Kuncoro, Yani Donchev, Rachita Chhaparia, Ionel Gog, Marc'Aurelio Ranzato, Jiajun Shen, Arthur Szlam
Progress in machine learning (ML) has been fueled by scaling neural network models.
no code implementations • 14 Nov 2023 • Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen
In this work, we propose a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that enables training of language models on islands of devices that are poorly connected.
no code implementations • 5 Jun 2023 • Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Alham Fikri Aji, Genta Indra Winata, Radityo Eko Prasojo, Phil Blunsom, Adhiguna Kuncoro
This evidence-based position paper critiques current research practices within the language model pre-training literature.
no code implementations • 19 Dec 2022 • Clara Meister, Wojciech Stokowiec, Tiago Pimentel, Lei Yu, Laura Rimell, Adhiguna Kuncoro
After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, making it difficult to estimate the probability distribution over next tokens.
no code implementations • 1 Mar 2022 • Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Miloš Stanojević, Phil Blunsom, Chris Dyer
We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics.
3 code implementations • NA 2021 • Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent SIfre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.
Ranked #1 on Language Modelling on StackExchange
no code implementations • 31 Oct 2021 • Xiang Lorraine Li, Adhiguna Kuncoro, Jordan Hoffmann, Cyprien de Masson d'Autume, Phil Blunsom, Aida Nematzadeh
Language models (LMs) trained on large amounts of data have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup.
2 code implementations • EMNLP 2021 • Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra, Ayu Purwarianti, Pascale Fung
Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems.
1 code implementation • NeurIPS 2021 • Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom
Hence, given the compilation of ever-larger language modelling datasets, combined with the growing list of language-model-based NLP applications that require up-to-date factual knowledge about the world, we argue that now is the right time to rethink the static way in which we currently train and evaluate our language models, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world.
no code implementations • 27 May 2020 • Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom
Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence.
no code implementations • IJCNLP 2019 • John Hale, Adhiguna Kuncoro, Keith Hall, Chris Dyer, Jonathan Brennan
Domain-specific training typically makes NLP systems work better.
no code implementations • ACL 2019 • Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, Phil Blunsom
Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models.
1 code implementation • NAACL 2019 • Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis
On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese.
Ranked #11 on Constituency Grammar Induction on PTB Diagnostic ECG Database (Max F1 (WSJ) metric)
no code implementations • ACL 2018 • Adhiguna Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, Phil Blunsom
Language exhibits hierarchical structure, but recent work using a subject-verb agreement diagnostic argued that state-of-the-art language models, LSTMs, fail to learn long-range syntax sensitive dependencies.
no code implementations • ACL 2018 • John Hale, Chris Dyer, Adhiguna Kuncoro, Jonathan R. Brennan
Model comparisons attribute the early peak to syntactic composition within the RNNG.
no code implementations • ICLR 2018 • Dani Yogatama, Yishu Miao, Gabor Melis, Wang Ling, Adhiguna Kuncoro, Chris Dyer, Phil Blunsom
We compare and analyze sequential, random access, and stack memory architectures for recurrent neural network language models.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
1 code implementation • EACL 2017 • Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith
We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection.
Ranked #22 on Constituency Parsing on Penn Treebank
1 code implementation • EMNLP 2016 • Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah A. Smith
We introduce two first-order graph-based dependency parsers achieving a new state of the art.
Ranked #17 on Dependency Parsing on Penn Treebank
no code implementations • 22 Apr 2016 • Adhiguna Kuncoro, Yuichiro Sawai, Kevin Duh, Yuji Matsumoto
We propose a transition-based dependency parser using Recurrent Neural Networks with Long Short-Term Memory (LSTM) units.
6 code implementations • NAACL 2016 • Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith
We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure.
Ranked #27 on Constituency Parsing on Penn Treebank