1 code implementation • 6 Mar 2024 • Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis
To date, toxicity mitigation in language models has almost entirely been focused on single-language settings.
no code implementations • 27 Feb 2024 • Çağatay Yıldız, Nishaanth Kanna Ravichandran, Prishruit Punia, Matthias Bethge, Beyza Ermis
This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on developing strategies for efficient and sustainable training.
no code implementations • 29 Nov 2023 • Meriem Boubdir, Edward Kim, Beyza Ermis, Sara Hooker, Marzieh Fadaee
In Natural Language Processing (NLP), the Elo rating system, originally designed for ranking players in dynamic games such as chess, is increasingly being used to evaluate Large Language Models (LLMs) through "A vs B" paired comparisons.
no code implementations • 22 Oct 2023 • Meriem Boubdir, Edward Kim, Beyza Ermis, Marzieh Fadaee, Sara Hooker
Human evaluation is increasingly critical for assessing large language models, capturing linguistic nuances, and reflecting user preferences more accurately than traditional automated metrics.
1 code implementation • 11 Oct 2023 • Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models.
1 code implementation • 24 Apr 2023 • Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
We evaluate the implications of these changes on the reproducibility of findings that compare the relative merits of models and methods that aim to curb toxicity.
2 code implementations • 14 Jul 2022 • Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cédric Archambeau, Giovanni Zappella
Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run.
no code implementations • 28 Jun 2022 • Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau
This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as the amount of data that can be stored or the limited computation sources that can be used.
no code implementations • 9 Mar 2022 • Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau
Moreover, applications increasingly rely on large pre-trained neural networks, such as pre-trained Transformers, since the resources or data might not be available in sufficiently large quantities to practitioners to train the model from scratch.
no code implementations • 27 Apr 2020 • Beyza Ermis, Patrick Ernst, Yannik Stein, Giovanni Zappella
Personalization is a crucial aspect of many online experiences.
no code implementations • ICML 2020 • Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner
Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.
no code implementations • 30 Nov 2017 • Beyza Ermis, Ali Taylan Cemgil
In this paper, we modify the recently proposed variational dropout technique which provided an elegant Bayesian interpretation to dropout, and show that the intrinsic noise in the variational dropout can be exploited to obtain a degree of differential privacy.
no code implementations • 30 Nov 2017 • Beyza Ermis, Ali Taylan Cemgil
Large data collections required for the training of neural networks often contain sensitive information such as the medical histories of patients, and the privacy of the training data must be preserved.
no code implementations • 17 Jul 2015 • Cedric Archambeau, Beyza Ermis
We introduce incremental variational inference and apply it to latent Dirichlet allocation (LDA).
no code implementations • 29 Sep 2014 • Beyza Ermis, A. Taylan Cemgil
Probabilistic approaches for tensor factorization aim to extract meaningful structure from incomplete data by postulating low rank constraints.