Search Results for author: Dirk Hovy

Found 110 papers, 31 papers with code

We Need to Consider Disagreement in Evaluation

no code implementations ACL (BPPF) 2021 Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, Alexandra Uma

Instead, we suggest that we need to better capture the sources of disagreement to improve today’s evaluation practice.

Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design

no code implementations SIGDIAL (ACL) 2022 A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau, Verena Rieser

Over the last several years, end-to-end neural conversational agents have vastly improved their ability to carry unrestricted, open-domain conversations with humans.

Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa

no code implementations ACL 2022 Tommaso Fornaciari, Alexandra Uma, Massimo Poesio, Dirk Hovy

Natural Language Processing (NLP) ‘s applied nature makes it necessary to select the most effective and robust models.

Experimental Design

A Report on the VarDial Evaluation Campaign 2020

no code implementations VarDial (COLING) 2020 Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri

This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.

Dialect Identification

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

1 code implementation WASSA (ACL) 2022 Federico Bianchi, Debora Nozza, Dirk Hovy

Detecting emotion in text allows social and computational scientists to study how people behave and react to online events.

MilaNLP @ WASSA: Does BERT Feel Sad When You Cry?

no code implementations EACL (WASSA) 2021 Tommaso Fornaciari, Federico Bianchi, Debora Nozza, Dirk Hovy

The paper describes the MilaNLP team’s submission (Bocconi University, Milan) in the WASSA 2021 Shared Task on Empathy Detection and Emotion Classification.

Emotion Classification Multi-Task Learning

FEEL-IT: Emotion and Sentiment Classification for the Italian Language

1 code implementation EACL (WASSA) 2021 Federico Bianchi, Debora Nozza, Dirk Hovy

While sentiment analysis is a popular task to understand people’s reactions online, we often need more nuanced information: is the post negative because the user is angry or sad?

Classification Sentiment Analysis +1

Pipelines for Social Bias Testing of Large Language Models

no code implementations BigScience (ACL) 2022 Debora Nozza, Federico Bianchi, Dirk Hovy

We hope to open a discussion on the best methodologies to handle social bias testing in language models.

Conversations as a Source for Teaching Scientific Concepts at Different Education Levels

no code implementations16 Apr 2024 Donya Rooein, Dirk Hovy

While language models hold great promise for educational applications, there are substantial challenges in training them to engage in meaningful and effective conversational teaching, especially when considering the diverse needs of various audiences.

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

2 code implementations8 Apr 2024 Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

Researchers and practitioners have met these concerns by introducing an abundance of new datasets for evaluating and improving LLM safety.

Language Modelling Large Language Model

DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods

no code implementations8 Mar 2024 Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz

DADIT enables us to train and compare the performance of various state-of-the-art models for the prediction of the gender and age of social media users.

Impoverished Language Technology: The Lack of (Social) Class in NLP

no code implementations6 Mar 2024 Amanda Cercas Curry, Zeerak Talat, Dirk Hovy

Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception.

Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions

1 code implementation2 Mar 2024 Flor Miriam Plaza-del-Arco, Alba Curry, Amanda Cercas Curry, Dirk Hovy

We then discuss four lacunae: (1) the absence of demographic and cultural aspects does not account for the variation in how emotions are perceived, but instead assumes they are universally experienced in the same manner; (2) the poor fit of emotion categories from the two main emotion theories to the task; (3) the lack of standardized EA terminology hinders gap identification, comparison, and future goals; and (4) the absence of interdisciplinary research isolates EA from insights in other fields.

Emotion Recognition

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

1 code implementation26 Feb 2024 Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy

Motivated by this discrepancy, we challenge the prevailing constrained evaluation paradigm for values and opinions in LLMs and explore more realistic unconstrained evaluations.

Multiple-choice

Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

no code implementations23 Jan 2024 Nikita Soni, Niranjan Balasubramanian, H. Andrew Schwartz, Dirk Hovy

We compare pre-training models with human context via 1) group attributes, 2) individual users, and 3) a combined approach on 5 user- and document-level tasks.

Age Estimation Language Modelling

Know Your Audience: Do LLMs Adapt to Different Age and Education Levels?

no code implementations4 Dec 2023 Donya Rooein, Amanda Cercas Curry, Dirk Hovy

We find large variations in the readability of the answers by different LLMs.

How to Use Large Language Models for Text Coding: The Case of Fatherhood Roles in Public Policy Documents

1 code implementation20 Nov 2023 Lorenzo Lupo, Oscar Magnusson, Dirk Hovy, Elin Naurin, Lena Wängnerud

Recent advances in large language models (LLMs) like GPT-3 and GPT-4 have opened up new opportunities for text analysis in political science.

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

1 code implementation2 Aug 2023 Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours in a systematic way.

Language Modelling

The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics

1 code implementation20 Jun 2023 Matthias Orlikowski, Paul Röttger, Philipp Cimiano, Dirk Hovy

To account for sociodemographics in models of individual annotator behaviour, we introduce group-specific layers to multi-annotator models.

What about em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

no code implementations25 May 2023 Anne Lauscher, Debora Nozza, Archie Crowley, Ehm Miltersen, Dirk Hovy

As 3rd-person pronoun usage shifts to include novel forms, e. g., neopronouns, we need more research on identity-inclusive NLP.

Machine Translation Translation

Leveraging Social Interactions to Detect Misinformation on Social Media

no code implementations6 Apr 2023 Tommaso Fornaciari, Luca Luceri, Emilio Ferrara, Dirk Hovy

Keeping track of the sequence of the interactions during the time, we improve over previous state-of-the-art models.

Misinformation

Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement

no code implementations25 Jan 2023 Gavin Abercrombie, Verena Rieser, Dirk Hovy

We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) tasks.

Bridging Fairness and Environmental Sustainability in Natural Language Processing

no code implementations8 Nov 2022 Marius Hessenthaler, Emma Strubell, Dirk Hovy, Anne Lauscher

Fairness and environmental impact are important research directions for the sustainable development of artificial intelligence.

Dimensionality Reduction Fairness +4

SocioProbe: What, When, and Where Language Models Learn about Sociodemographics

1 code implementation8 Nov 2022 Anne Lauscher, Federico Bianchi, Samuel Bowman, Dirk Hovy

Our results show that PLMs do encode these sociodemographics, and that this knowledge is sometimes spread across the layers of some of the tested PLMs.

ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

1 code implementation26 Oct 2022 Tommaso Fornaciari, Dirk Hovy, Federico Bianchi

The most common ways to explore latent document dimensions are topic models and clustering methods.

Clustering Topic Models

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages

1 code implementation20 Oct 2022 Paul Röttger, Debora Nozza, Federico Bianchi, Dirk Hovy

More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators.

Hate Speech Detection

The State of Profanity Obfuscation in Natural Language Processing

1 code implementation14 Oct 2022 Debora Nozza, Dirk Hovy

Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable.

Is It Worth the (Environmental) Cost? Limited Evidence for Temporal Adaptation via Continuous Training

no code implementations13 Oct 2022 Giuseppe Attanasio, Debora Nozza, Federico Bianchi, Dirk Hovy

Consequently, we should continuously update our models with new data to expose them to new events and facts.

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

1 code implementation13 Oct 2022 Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models.

Language Modelling Multi-Task Learning +2

On the Limitations of Sociodemographic Adaptation with Transformers

1 code implementation1 Aug 2022 Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

We adapt the language representations for the sociodemographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning for adaptation, where we couple language modeling with the prediction of a sociodemographic class.

Language Modelling Multi-Task Learning

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

1 code implementation Findings (ACL) 2022 Giuseppe Attanasio, Debora Nozza, Dirk Hovy, Elena Baralis

EAR also reveals overfitting terms, i. e., terms most likely to induce bias, to help identify their effect on the model, task, and predictions.

Bias Detection Fairness +1

Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender

no code implementations COLING 2022 Anne Lauscher, Archie Crowley, Dirk Hovy

Based on our observations and ethical considerations, we define a series of desiderata for modeling pronouns in language technology.

Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data

1 code implementation26 Jan 2022 Federico Bianchi, Vincenzo Cutrona, Dirk Hovy

Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific discoveries in recent years.

Language Invariant Properties in Natural Language Processing

1 code implementation nlppower (ACL) 2022 Federico Bianchi, Debora Nozza, Dirk Hovy

We introduce language invariant properties: i. e., properties that should not change when we transform text, and how they can be used to quantitatively evaluate the robustness of transformation algorithms.

Paraphrase Generation Translation

Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

no code implementations7 Jul 2021 Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans.

The Importance of Modeling Social Factors of Language: Theory and Practice

no code implementations NAACL 2021 Dirk Hovy, Diyi Yang

We show that current NLP systems systematically break down when faced with interpreting the social factors of language.

Integrating Ethics into the NLP Curriculum

no code implementations ACL 2020 Emily M. Bender, Dirk Hovy, Alex Schofield, ra

To raise awareness among future NLP practitioners and prevent inertia in the field, we need to place ethics in the curriculum for all NLP students{---}not as an elective, but as a core part of their education.

Ethics

Cross-lingual Contextualized Topic Models with Zero-shot Learning

2 code implementations EACL 2021 Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini

They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models.

Topic Models Transfer Learning +2

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

3 code implementations ACL 2021 Federico Bianchi, Silvia Terragni, Dirk Hovy

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data.

Sentence Embeddings Topic Models +1

What the [MASK]? Making Sense of Language-Specific BERT Models

no code implementations5 Mar 2020 Debora Nozza, Federico Bianchi, Dirk Hovy

Driven by the potential of BERT models, the NLP community has started to investigate and generate an abundant number of BERT models that are trained on a particular language, and tested on a specific data domain and task.

Language Modelling

Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers

no code implementations WS 2019 Hanh Nguyen, Dirk Hovy

User reviews provide a significant source of information for companies to understand their market and audience.

Topic Models

Identifying Linguistic Areas for Geolocation

no code implementations WS 2019 Tommaso Fornaciari, Dirk Hovy

We create three sets of labels at different levels of granularity, and compare performance of a state-of-the-art geolocation model trained and tested with P2C labels to one with regular k-d tree labels.

Clustering

Geolocation with Attention-Based Multitask Learning Models

no code implementations WS 2019 Tommaso Fornaciari, Dirk Hovy

Geolocation, predicting the location of a post based on text and other information, has a huge potential for several social media applications.

Multi-class Classification regression

Dense Node Representation for Geolocation

no code implementations WS 2019 Tommaso Fornaciari, Dirk Hovy

Prior research has shown that geolocation can be substantially improved by including user network information.

Women's Syntactic Resilience and Men's Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing

no code implementations ACL 2019 Aparna Garimella, Carmen Banea, Dirk Hovy, Rada Mihalcea

Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a particular gender, but models for part-of-speech tagging and dependency parsing have still not adapted to account for these differences.

Dependency Parsing Part-Of-Speech Tagging

Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

1 code implementation EMNLP 2018 Dirk Hovy, Tommaso Fornaciari

We use homophily cues to retrofit text-based author representations with non-linguistic information, and introduce a trade-off parameter.

Attribute General Classification +3

The Social and the Neural Network: How to Make Natural Language Processing about People again

no code implementations WS 2018 Dirk Hovy

Over the years, natural language processing has increasingly focused on tasks that can be solved by statistical models, but ignored the social aspects of language.

Comparing Bayesian Models of Annotation

no code implementations TACL 2018 Silviu Paun, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, Massimo Poesio

We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators.

Model Selection

Multi-Task Learning for Mental Health using Social Media Text

no code implementations10 Dec 2017 Adrian Benton, Margaret Mitchell, Dirk Hovy

We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework.

Gender Prediction Multi-Task Learning

Huntsville, hospitals, and hockey teams: Names can reveal your location

no code implementations WS 2017 Bahar Salehi, Dirk Hovy, Eduard Hovy, Anders S{\o}gaard

Geolocation is the task of identifying a social media user{'}s primary location, and in natural language processing, there is a growing literature on to what extent automated analysis of social media posts can help.

Knowledge Base Population Recommendation Systems +1

End-to-End Information Extraction without Token-Level Supervision

1 code implementation WS 2017 Rasmus Berg Palm, Dirk Hovy, Florian Laws, Ole Winther

End-to-end (E2E) models, which take raw text as input and produce the desired output directly, need not depend on token-level labels.

When POS data sets don't add up: Combatting sample bias

no code implementations LREC 2014 Dirk Hovy, Barbara Plank, Anders S{\o}gaard

We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21{\%}.

POS TAG

Augmenting English Adjective Senses with Supersenses

1 code implementation LREC 2014 Yulia Tsvetkov, Nathan Schneider, Dirk Hovy, Archna Bhatia, Manaal Faruqui, Chris Dyer

We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification.

Classification General Classification

Crowdsourcing and annotating NER for Twitter \#drift

no code implementations LREC 2014 Hege Fromreide, Dirk Hovy, Anders S{\o}gaard

We present two new NER datasets for Twitter; a manually annotated set of 1, 467 tweets (kappa=0. 942) and a set of 2, 975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010).

NER

Cannot find the paper you are looking for? You can Submit a new open access paper.