no code implementations • WMT (EMNLP) 2020 • Lucia Specia, Zhenhao Li, Juan Pino, Vishrav Chaudhary, Francisco Guzmán, Graham Neubig, Nadir Durrani, Yonatan Belinkov, Philipp Koehn, Hassan Sajjad, Paul Michel, Xian Li
We report the findings of the second edition of the shared task on improving robustness in Machine Translation (MT).
no code implementations • 23 Aug 2023 • Lucas Weber, Jaap Jumelet, Paul Michel, Elia Bruni, Dieuwke Hupkes
We present a number of different case studies with different common hand-crafted and automated CL approaches to illustrate this phenomenon, and we find that none of them outperforms optimisation with only Adam with well-chosen hyperparameters.
1 code implementation • 30 Sep 2022 • Mathieu Rita, Corentin Tallec, Paul Michel, Jean-bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub
Lewis signaling games are a class of simple communication games for simulating the emergence of language.
2 code implementations • 27 May 2022 • Lucio M. Dery, Paul Michel, Mikhail Khodak, Graham Neubig, Ameet Talwalkar
Auxiliary objectives, supplementary learning signals that are introduced to help aid learning on data-starved or highly complex end-tasks, are commonplace in machine learning.
1 code implementation • ICLR 2022 • Paul Michel, Tatsunori Hashimoto, Graham Neubig
As machine learning models are deployed ever more broadly, it becomes increasingly important that they are not only able to perform well on their training distribution, but also yield accurate predictions when confronted with distribution shift.
no code implementations • 12 Oct 2021 • Paul Michel, Sebastian Ruder, Dani Yogatama
When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i. e. the performance on the task with the lowest accuracy).
2 code implementations • ICLR 2022 • Lucio M. Dery, Paul Michel, Ameet Talwalkar, Graham Neubig
In most settings of practical concern, machine learning practitioners know in advance what end-task they wish to boost with auxiliary tasks.
1 code implementation • 3 Sep 2021 • Paul Michel
that the data is sampled from a fixed distribution both at training and test time.
1 code implementation • 14 Jun 2021 • Chunting Zhou, Xuezhe Ma, Paul Michel, Graham Neubig
Group distributionally robust optimization (DRO) provides an effective tool to alleviate covariate shift by minimizing the worst-case training loss over a set of pre-defined groups.
1 code implementation • ICLR 2021 • Paul Michel, Tatsunori Hashimoto, Graham Neubig
Distributionally robust optimization (DRO) provides a framework for training machine learning models that are able to perform well on a collection of related data distributions (the "uncertainty set").
no code implementations • ACL 2020 • Keita Kurita, Paul Michel, Graham Neubig
Recently, NLP has seen a surge in the usage of large pre-trained models.
1 code implementation • 14 Apr 2020 • Keita Kurita, Paul Michel, Graham Neubig
We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure.
1 code implementation • ICML 2020 • Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig
To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems.
no code implementations • 25 Sep 2019 • Paul Michel, Elisabeth Salesky, Graham Neubig
Regularization-based continual learning approaches generally prevent catastrophic forgetting by augmenting the training loss with an auxiliary objective.
1 code implementation • WS 2019 • Xi-An Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad
We share the findings of the first shared task on improving robustness of Machine Translation (MT).
3 code implementations • NeurIPS 2019 • Paul Michel, Omer Levy, Graham Neubig
Attention is a powerful and ubiquitous mechanism for allowing neural models to focus on particular salient pieces of information by taking their weighted average when making predictions.
no code implementations • ICLR 2019 • Paul Michel, Graham Neubig, Xi-An Li, Juan Miguel Pino
Adversarial examples have been shown to be an effective way of assessing the robustness of neural sequence-to-sequence (seq2seq) models, by applying perturbations to the input of a model leading to large degradation in performance.
2 code implementations • NAACL 2019 • Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, Xinyi Wang, John Wieting
In this paper, we describe compare-mt, a tool for holistic analysis and comparison of the results of systems for language generation tasks such as machine translation.
1 code implementation • NAACL 2019 • Paul Michel, Xi-An Li, Graham Neubig, Juan Miguel Pino
Adversarial examples --- perturbations to the input of a model that elicit large changes in the output --- have been shown to be an effective way of assessing the robustness of sequence-to-sequence (seq2seq) models.
2 code implementations • EMNLP 2018 • Paul Michel, Graham Neubig
In this paper, we propose a benchmark dataset for Machine Translation of Noisy Text (MTNT), consisting of noisy comments on Reddit (www. reddit. com) and professionally sourced translations.
1 code implementation • ACL 2018 • Paul Michel, Graham Neubig
Every person speaks or writes their own flavor of their native language, influenced by a number of factors: the content they tend to talk about, their gender, their social status, or their geographical origin.
no code implementations • WS 2017 • Paul Michel, Ravich, Abhilasha er, Shruti Rijhwani
We investigate the pertinence of methods from algebraic topology for text data analysis.
no code implementations • 31 May 2017 • Paul Michel, Abhilasha Ravichander, Shruti Rijhwani
We investigate the pertinence of methods from algebraic topology for text data analysis.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
no code implementations • ACL 2017 • Paul Michel, Okko Räsänen, Roland Thiollière, Emmanuel Dupoux
Phonemic segmentation of speech is a critical step of speech recognition systems.