no code implementations • EACL (WASSA) 2021 • Sotiris Lamprinidis, Federico Bianchi, Daniel Hardt, Dirk Hovy
While emotions are universal aspects of human psychology, they are expressed differently across different languages and cultures.
1 code implementation • EACL (WASSA) 2021 • Federico Bianchi, Debora Nozza, Dirk Hovy
While sentiment analysis is a popular task to understand people’s reactions online, we often need more nuanced information: is the post negative because the user is angry or sad?
no code implementations • EACL (WASSA) 2021 • Tommaso Fornaciari, Federico Bianchi, Debora Nozza, Dirk Hovy
The paper describes the MilaNLP team’s submission (Bocconi University, Milan) in the WASSA 2021 Shared Task on Empathy Detection and Emotion Classification.
no code implementations • ACL 2022 • Tommaso Fornaciari, Alexandra Uma, Massimo Poesio, Dirk Hovy
Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models.
1 code implementation • LTEDI (ACL) 2022 • Debora Nozza, Federico Bianchi, Anne Lauscher, Dirk Hovy
Current language technology is ubiquitous and directly influences individuals’ lives worldwide.
1 code implementation • BigScience (ACL) 2022 • Debora Nozza, Federico Bianchi, Dirk Hovy
We hope to open a discussion on the best methodologies to handle social bias testing in language models.
no code implementations • SIGDIAL (ACL) 2022 • A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau, Verena Rieser
Over the last several years, end-to-end neural conversational agents have vastly improved their ability to carry unrestricted, open-domain conversations with humans.
no code implementations • ACL (BPPF) 2021 • Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, Alexandra Uma
Instead, we suggest that we need to better capture the sources of disagreement to improve today’s evaluation practice.
no code implementations • VarDial (COLING) 2020 • Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri
This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.
no code implementations • ACL 2022 • Emily Dinan, Gavin Abercrombie, A. Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser
We then empirically assess the extent to which current tools can measure these effects and current systems display them.
1 code implementation • WASSA (ACL) 2022 • Federico Bianchi, Debora Nozza, Dirk Hovy
Detecting emotion in text allows social and computational scientists to study how people behave and react to online events.
1 code implementation • nlppower (ACL) 2022 • Giuseppe Attanasio, Debora Nozza, Eliana Pastor, Dirk Hovy
In this paper, we provide the first benchmark study of interpretability approaches for hate speech detection.
1 code implementation • 24 Aug 2024 • Antonina Sinelnik, Dirk Hovy
Any report frames issues to favor a particular interpretation by highlighting or excluding certain aspects of a story.
no code implementations • 8 Aug 2024 • Fabio Pernisi, Dirk Hovy, Paul Röttger
We contribute towards closing this gap by investigating the effectiveness of many-shot jailbreaking, where models are prompted with unsafe demonstrations to induce unsafe behaviour, in Italian.
no code implementations • 9 Jul 2024 • Flor Miriam Plaza-del-Arco, Amanda Cercas Curry, Susanna Paoli, Alba Curry, Dirk Hovy
We ascribe these to cultural bias in LLMs and the scarcity of NLP literature on religion.
no code implementations • 15 May 2024 • Donya Rooein, Paul Rottger, Anastassia Shaitarova, Dirk Hovy
We, therefore, introduce and evaluate a new set of Prompt-based metrics for text difficulty.
1 code implementation • 10 May 2024 • Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, Jingyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych
We hope that our work will help set the agenda for research in machine-assisted scientific quality control in the age of AI, within the NLP community and beyond.
no code implementations • 3 May 2024 • Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank
While NLP is getting better at solving the formal linguistic aspects, limited progress has been made in adding the social awareness required for language applications to work in all situations for all users.
no code implementations • 16 Apr 2024 • Donya Rooein, Dirk Hovy
While language models hold great promise for educational applications, there are substantial challenges in training them to engage in meaningful and effective conversational teaching, especially when considering the diverse needs of various audiences.
1 code implementation • 8 Apr 2024 • Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy
Researchers and practitioners have met these concerns by introducing an abundance of new datasets for evaluating and improving LLM safety.
1 code implementation • 8 Mar 2024 • Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz
DADIT enables us to train and compare the performance of various state-of-the-art models for the prediction of the gender and age of social media users.
no code implementations • 7 Mar 2024 • Amanda Cercas Curry, Giuseppe Attanasio, Zeerak Talat, Dirk Hovy
We argue for the inclusion of socioeconomic class in future language technologies.
no code implementations • 6 Mar 2024 • Amanda Cercas Curry, Zeerak Talat, Dirk Hovy
Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception.
1 code implementation • 5 Mar 2024 • Flor Miriam Plaza-del-Arco, Amanda Cercas Curry, Alba Curry, Gavin Abercrombie, Dirk Hovy
We then analyze the emotions generated by the models in relation to the gender-event pairs.
1 code implementation • 2 Mar 2024 • Flor Miriam Plaza-del-Arco, Alba Curry, Amanda Cercas Curry, Dirk Hovy
We then discuss four lacunae: (1) the absence of demographic and cultural aspects does not account for the variation in how emotions are perceived, but instead assumes they are universally experienced in the same manner; (2) the poor fit of emotion categories from the two main emotion theories to the task; (3) the lack of standardized EA terminology hinders gap identification, comparison, and future goals; and (4) the absence of interdisciplinary research isolates EA from insights in other fields.
1 code implementation • 28 Feb 2024 • Giuseppe Attanasio, Beatrice Savoldi, Dennis Fucci, Dirk Hovy
Our findings have implications for the improvement of multilingual ASR systems, underscoring the importance of accessibility to training data and nuanced evaluation to predict and mitigate gender gaps.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 26 Feb 2024 • Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy
Motivated by this discrepancy, we challenge the prevailing constrained evaluation paradigm for values and opinions in LLMs and explore more realistic unconstrained evaluations.
1 code implementation • 22 Feb 2024 • Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank
The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging.
no code implementations • 23 Jan 2024 • Nikita Soni, Niranjan Balasubramanian, H. Andrew Schwartz, Dirk Hovy
Pre-trained language models consider the context of neighboring words and documents but lack any author context of the human generating the text.
no code implementations • 4 Dec 2023 • Donya Rooein, Amanda Cercas Curry, Dirk Hovy
We find large variations in the readability of the answers by different LLMs.
1 code implementation • 20 Nov 2023 • Lorenzo Lupo, Oscar Magnusson, Dirk Hovy, Elin Naurin, Lena Wängnerud
Recent advances in large language models (LLMs) like GPT-3. 5 and GPT-4 promise automation with better results and less programming, opening up new opportunities for text analysis in political science.
1 code implementation • 14 Sep 2023 • Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis
Existing work focuses on a few spoken language understanding (SLU) tasks, and explanations are difficult to interpret for most users.
1 code implementation • 2 Aug 2023 • Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy
In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours in a systematic way.
no code implementations • 24 Jul 2023 • Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy
Recent studies emphasize the importance of considering human label variation in data annotation.
1 code implementation • 20 Jun 2023 • Matthias Orlikowski, Paul Röttger, Philipp Cimiano, Dirk Hovy
To account for sociodemographics in models of individual annotator behaviour, we introduce group-specific layers to multi-annotator models.
no code implementations • 25 May 2023 • Anne Lauscher, Debora Nozza, Archie Crowley, Ehm Miltersen, Dirk Hovy
As 3rd-person pronoun usage shifts to include novel forms, e. g., neopronouns, we need more research on identity-inclusive NLP.
no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai, Chris van der Lee, Yiru Li, Saad Mahamood, Margot Mieskes, Emiel van Miltenburg, Pablo Mosteiro, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Jie Ruan, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.
no code implementations • 6 Apr 2023 • Tommaso Fornaciari, Luca Luceri, Emilio Ferrara, Dirk Hovy
Keeping track of the sequence of the interactions during the time, we improve over previous state-of-the-art models.
no code implementations • 25 Jan 2023 • Gavin Abercrombie, Verena Rieser, Dirk Hovy
We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) tasks.
1 code implementation • 18 Dec 2022 • Rishav Hada, Amir Ebrahimi Fard, Sarah Shugars, Federico Bianchi, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintarev
We find that the diversity scores for both Fragmentation and Representation are lower for immigration than for DST.
no code implementations • 8 Nov 2022 • Marius Hessenthaler, Emma Strubell, Dirk Hovy, Anne Lauscher
Fairness and environmental impact are important research directions for the sustainable development of artificial intelligence.
1 code implementation • 8 Nov 2022 • Anne Lauscher, Federico Bianchi, Samuel Bowman, Dirk Hovy
Our results show that PLMs do encode these sociodemographics, and that this knowledge is sometimes spread across the layers of some of the tested PLMs.
no code implementations • 28 Oct 2022 • Federico Bianchi, Stefanie Anja Hills, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintarev
Well-annotated data is a prerequisite for good Natural Language Processing models.
1 code implementation • 26 Oct 2022 • Tommaso Fornaciari, Dirk Hovy, Federico Bianchi
The most common ways to explore latent document dimensions are topic models and clustering methods.
1 code implementation • 20 Oct 2022 • Paul Röttger, Debora Nozza, Federico Bianchi, Dirk Hovy
More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators.
1 code implementation • 14 Oct 2022 • Debora Nozza, Dirk Hovy
Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable.
no code implementations • 13 Oct 2022 • Giuseppe Attanasio, Debora Nozza, Federico Bianchi, Dirk Hovy
Consequently, we should continuously update our models with new data to expose them to new events and facts.
1 code implementation • 13 Oct 2022 • Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš
Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models.
1 code implementation • 1 Aug 2022 • Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš
We adapt the language representations for the sociodemographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning for adaptation, where we couple language modeling with the prediction of a sociodemographic class.
1 code implementation • Findings (ACL) 2022 • Giuseppe Attanasio, Debora Nozza, Dirk Hovy, Elena Baralis
EAR also reveals overfitting terms, i. e., terms most likely to induce bias, to help identify their effect on the model, task, and predictions.
no code implementations • COLING 2022 • Anne Lauscher, Archie Crowley, Dirk Hovy
Based on our observations and ethical considerations, we define a series of desiderata for modeling pronouns in language technology.
1 code implementation • 26 Jan 2022 • Federico Bianchi, Vincenzo Cutrona, Dirk Hovy
Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific discoveries in recent years.
no code implementations • 19 Jan 2022 • Kilian Theil, Dirk Hovy, Heiner Stuckenschmidt
How much does a CEO's personality impact the performance of their company?
1 code implementation • NAACL 2022 • Paul Röttger, Bertie Vidgen, Dirk Hovy, Janet B. Pierrehumbert
To address this issue, we propose two contrasting paradigms for data annotation.
1 code implementation • nlppower (ACL) 2022 • Federico Bianchi, Debora Nozza, Dirk Hovy
We introduce language invariant properties: i. e., properties that should not change when we transform text, and how they can be used to quantitatively evaluate the robustness of transformation algorithms.
no code implementations • 7 Jul 2021 • Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser
Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans.
1 code implementation • NAACL 2021 • Debora Nozza, Federico Bianchi, Dirk Hovy
Our results show that 4. 3{\%} of the time, language models complete a sentence with a hurtful word.
Ranked #1 on Hurtful Sentence Completion on HONEST
no code implementations • NAACL 2021 • Dirk Hovy, Diyi Yang
We show that current NLP systems systematically break down when faced with interpreting the social factors of language.
no code implementations • NAACL 2021 • Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy, Massimo Poesio
Supervised learning assumes that a ground truth label exists.
no code implementations • EACL 2021 • Tommaso Fornaciari, Federico Bianchi, Massimo Poesio, Dirk Hovy
In most cases, however, the target texts{'} preceding context is not considered.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Farzana Rashid, Tommaso Fornaciari, Dirk Hovy, Eduardo Blanco, Fernando Vega-Redondo
When interacting with each other, we motivate, advise, inform, show love or power towards our peers.
no code implementations • ACL 2020 • Dirk Hovy, Federico Bianchi, Tommaso Fornaciari
The main goal of machine translation has been to convey the correct content.
no code implementations • ACL 2020 • Emily M. Bender, Dirk Hovy, Alex Schofield, ra
To raise awareness among future NLP practitioners and prevent inertia in the field, we need to place ethics in the curriculum for all NLP students{---}not as an elective, but as a core part of their education.
2 code implementations • EACL 2021 • Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini
They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models.
3 code implementations • ACL 2021 • Federico Bianchi, Silvia Terragni, Dirk Hovy
Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data.
no code implementations • 5 Mar 2020 • Debora Nozza, Federico Bianchi, Dirk Hovy
Driven by the potential of BERT models, the NLP community has started to investigate and generate an abundant number of BERT models that are trained on a particular language, and tested on a specific data domain and task.
no code implementations • ACL 2020 • Deven Shah, H. Andrew Schwartz, Dirk Hovy
In this paper, we propose a unifying conceptualization: the predictive bias framework for NLP.
no code implementations • WS 2019 • Tommaso Fornaciari, Dirk Hovy
Geolocation, predicting the location of a post based on text and other information, has a huge potential for several social media applications.
no code implementations • WS 2019 • Hanh Nguyen, Dirk Hovy
User reviews provide a significant source of information for companies to understand their market and audience.
no code implementations • WS 2019 • Tommaso Fornaciari, Dirk Hovy
Prior research has shown that geolocation can be substantially improved by including user network information.
no code implementations • WS 2019 • Tommaso Fornaciari, Dirk Hovy
We create three sets of labels at different levels of granularity, and compare performance of a state-of-the-art geolocation model trained and tested with P2C labels to one with regular k-d tree labels.
no code implementations • ACL 2019 • Aparna Garimella, Carmen Banea, Dirk Hovy, Rada Mihalcea
Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a particular gender, but models for part-of-speech tagging and dependency parsing have still not adapted to account for these differences.
no code implementations • EMNLP 2018 • Sotiris Lamprinidis, Daniel Hardt, Dirk Hovy
However, we also find that performance is very similar to that of a simple Logistic Regression model over character n-grams.
1 code implementation • EMNLP 2018 • Dirk Hovy, Tommaso Fornaciari
We use homophily cues to retrofit text-based author representations with non-linguistic information, and introduce a trade-off parameter.
no code implementations • EMNLP 2018 • Dirk Hovy, Christoph Purschke
Dialects are one of the main drivers of language variation, a major challenge for natural language processing tools.
no code implementations • WS 2018 • Dirk Hovy
Over the years, natural language processing has increasingly focused on tasks that can be solved by statistical models, but ignored the social aspects of language.
no code implementations • TACL 2018 • Silviu Paun, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, Massimo Poesio
We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators.
no code implementations • 10 Dec 2017 • Adrian Benton, Margaret Mitchell, Dirk Hovy
We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework.
no code implementations • WS 2017 • Bahar Salehi, Dirk Hovy, Eduard Hovy, Anders S{\o}gaard
Geolocation is the task of identifying a social media user{'}s primary location, and in natural language processing, there is a growing literature on to what extent automated analysis of social media posts can help.
1 code implementation • WS 2017 • Rasmus Berg Palm, Dirk Hovy, Florian Laws, Ole Winther
End-to-end (E2E) models, which take raw text as input and produce the desired output directly, need not depend on token-level labels.
no code implementations • LREC 2016 • Dirk Hovy, Anders Johannsen
Language varies not only between countries, but also along regional and socio-demographic lines.
no code implementations • LREC 2014 • Dirk Hovy, Barbara Plank, Anders S{\o}gaard
We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21{\%}.
1 code implementation • LREC 2014 • Yulia Tsvetkov, Nathan Schneider, Dirk Hovy, Archna Bhatia, Manaal Faruqui, Chris Dyer
We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification.
no code implementations • LREC 2014 • Hege Fromreide, Dirk Hovy, Anders S{\o}gaard
We present two new NER datasets for Twitter; a manually annotated set of 1, 467 tweets (kappa=0. 942) and a set of 2, 975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010).