no code implementations • NAACL (PrivateNLP) 2022 • Natalia Ponomareva, Jasmijn Bastings, Sergei Vassilvitskii
We focus on T5 and show that by using recent advances in JAX and XLA we can train models with DP that do not suffer a large drop in pre-training utility, nor in training speed, and can still be fine-tuned to high accuracies on downstream tasks (e. g.
1 code implementation • 13 Jan 2024 • Kevin Robinson, Sneha Kudugunta, Romina Stella, Sunipa Dev, Jasmijn Bastings
Translation systems, including foundation models capable of translation, can produce errors that result in gender mistranslation, and such errors can be especially harmful.
no code implementations • 14 Nov 2023 • Chenxi Whitehouse, Fantine Huot, Jasmijn Bastings, Mostafa Dehghani, Chu-Cheng Lin, Mirella Lapata
Although the advancements of pre-trained Large Language Models have significantly accelerated recent progress in NLP, their ever-increasing size poses significant challenges for conventional fine-tuning, especially in memory-intensive tasks.
1 code implementation • 28 Apr 2023 • Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson
Given a subject-relation query, we study how the model aggregates information about the subject and relation to predict the correct attribute.
1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet
no code implementations • 23 May 2022 • Tao Lei, Ran Tian, Jasmijn Bastings, Ankur P. Parikh
In this work, we explore whether modeling recurrence into the Transformer architecture can both be beneficial and efficient, by building an extremely simple recurrent module into the Transformer.
3 code implementations • 31 Mar 2022 • Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen, Kathleen Kenealy, Jonathan H. Clark, Stephan Lee, Dan Garrette, James Lee-Thorp, Colin Raffel, Noam Shazeer, Marvin Ritter, Maarten Bosma, Alexandre Passos, Jeremy Maitin-Shepard, Noah Fiedel, Mark Omernick, Brennan Saeta, Ryan Sepassi, Alexander Spiridonov, Joshua Newlan, Andrea Gesmundo
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves.
no code implementations • 27 Jan 2022 • Alon Jacovi, Jasmijn Bastings, Sebastian Gehrmann, Yoav Goldberg, Katja Filippova
We posit that folk concepts of behavior provide us with a "language" that humans understand behavior with.
3 code implementations • ICLR 2022 • Emiel Hoogeboom, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, Tim Salimans
We introduce Autoregressive Diffusion Models (ARDMs), a model class encompassing and generalizing order-agnostic autoregressive models (Uria et al., 2014) and absorbing discrete diffusion (Austin et al., 2021), which we show are special cases of ARDMs under mild assumptions.
3 code implementations • ICLR 2022 • Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
1 code implementation • EMNLP (BlackboxNLP) 2020 • Jasmijn Bastings, Katja Filippova
There is a recent surge of interest in using attention as explanation of model predictions, with mixed evidence on whether attention can be used as such.
1 code implementation • EMNLP 2020 • Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan
We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models.
1 code implementation • EACL 2021 • Anders Søgaard, Sebastian Ebert, Jasmijn Bastings, Katja Filippova
We argue that random splits, like standard splits, lead to overly optimistic performance estimates.
8 code implementations • IJCNLP 2019 • Julia Kreutzer, Jasmijn Bastings, Stefan Riezler
We present Joey NMT, a minimalist neural machine translation toolkit based on PyTorch that is specifically designed for novices.
1 code implementation • ACL 2019 • Jasmijn Bastings, Wilker Aziz, Ivan Titov
The success of neural networks comes hand in hand with a desire for more interpretability.
no code implementations • 18 Jan 2019 • Jasmijn Bastings, Wilker Aziz, Ivan Titov, Khalil Sima'an
Recently it was shown that linguistic structure predicted by a supervised parser can be beneficial for neural machine translation (NMT).
1 code implementation • WS 2018 • Jasmijn Bastings, Marco Baroni, Jason Weston, Kyunghyun Cho, Douwe Kiela
Lake and Baroni (2018) recently introduced the SCAN data set, which consists of simple commands paired with action sequences and is intended to test the strong generalization abilities of recurrent sequence-to-sequence models.
no code implementations • NAACL 2018 • Diego Marcheggiani, Jasmijn Bastings, Ivan Titov
Semantic representations have long been argued as potentially useful for enforcing meaning preservation and improving generalization performance of machine translation methods.
Ranked #8 on Machine Translation on WMT2016 English-German
no code implementations • WS 2017 • Jan-Thorsten Peter, Hermann Ney, Ond{\v{r}}ej Bojar, Ngoc-Quan Pham, Jan Niehues, Alex Waibel, Franck Burlot, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Valters {\v{S}}ics, Jasmijn Bastings, Miguel Rios, Wilker Aziz, Philip Williams, Fr{\'e}d{\'e}ric Blain, Lucia Specia
no code implementations • EMNLP 2017 • Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, Khalil Sima'an
We present a simple and effective approach to incorporating syntactic structure into neural attention-based encoder-decoder models for machine translation.
no code implementations • LREC 2014 • Jasmijn Bastings, Khalil Sima{'}an
PARSEVAL, the default paradigm for evaluating constituency parsers, calculates parsing success (Precision/Recall) as a function of the number of matching labeled brackets across the test set.